What could be much better than RIFE for interpolating video frames?
Let me perhaps start with the basics. Why do we use RIFE and SVP?
Simple: a typical video has 24 frames per second, and with RIFE and SVP we can create intermediate frames and, for example, with x5 interpolation, have 120 frames per second and a smooth experience watching a movie or series.
The problem is that this 120 fps is far from what we would experience if we had the same movie natively recorded in 120 fps. My point is not that RIFE isn't perfect, because no algorithm can losslessly restore what was happening in front of the camera.
The point is that in typical 24 fps footage, artefacts in the form of motion blur have been artificially added to trick our brains into giving the impression of movement. As in photography, so in video, each frame has a specific exposure time. In photography, however, apart from a few artistic exceptions, we want the image to be clear without any blurring and therefore the exposure time to be as short as possible. In 24 fps video, the opposite is true: the long exposure time for each of the 24 frames is intended to simulate the effect of a larger number of frames through blurring. The effect is that during a running scene, for example, you don't see the runner's hand at all, just a blurry patch. This is the case at 24 fps with a typical film exposure time of 1/48 s. You can read more about this here: https://www.red.com/red-101/shutter-angle-tutorial
If the above example run had been recorded at 120 fps or 240 fps the runner's hand would be sharp and clear on every frame. RIFE and other typical AI algorithms will do well to interpolate 24 fps to 120 fps when the exposure time of each frame is very short at, say, 1/240 s and the details on each frame are sharp. However, there are no such videos, well, unless we create them ourselves, but then the question arises: why not record straight away in 120 fps or 240 fps? Then no RIFE would be necessary. Unfortunately, in most cases we have to make do with what we have, which is 24 fps and heavily blurred motion. In such a 24 fps video, the runner's hand will be blurred in the original and it will be blurred in the 120 fps created by the RIFE.
This is where the problem lies. We have a smooth 120 fps video obtained by interpolation with RIFE, but our brain nevertheless perceives that something is wrong. Something is artificial and many people hate interpolation precisely for this. Many explain to themselves that the magic of cinema is not there, that this was not the director's intention, that the effect is a soap opera.... however, this is just looking for an explanation that our brain hates what it sees. Can it be explained somehow?
Yes, it can be explained by the 'uncanny valley' effect: https://en.wikipedia.org/wiki/Uncanny_valley This whole magic of cinema is nothing more than the fact that 24 fps is so far from reality that our brain immediately knows it's a movie, entertainment and not reality. This is the same as an industrial robot for the 'uncanny valley' effect. We're 100% sure it's a robot so it doesn't cause us concern or uncertainty about what we're dealing with.
Now let's get back to what video frame interpolation does: 120 fps takes us away from the magic of cinema and brings us closer to reality. Of course, we all want movies to be more imersive and as real as possible: 3D, higher resolution and more frames per second. The problem is that there remains too much artificiality in the interpolation of video frames with RIFE (motion blur even at 120 or 240 fps), which causes us to fall right into that famous 'uncanny valley' when approaching reality. The film is smooth: 120 fps, but there is something wrong with it. Every frame has an unnaturally long blur for 120 fps. Despite 120 fps, certain details, instead of being clear and sharp, are blurred and distorted by motion blur.
Of course, we accept the compromises because we know that RIFE is still the best way to get us closer to the ideal and we have no alternative. In the same way that for many years we accepted artifacts from the basic SVP algorithm because there was no alternative. However, there are some people to whom, if we show the effect of RIFE or SVP interpolation, will still prefer the original 24 fps. Often this is due not to stubbornness or conservatism, but to the simple fact that there is something wrong with the interpolated image.
My dream is to interpolate the video frames of a 24 fps movie in such a way that we get the effect we would get from watching that movie shot natively in 120, 240 or 500 fps. Yes, we already have 500 Hz monitors and this effect can be achieved!