Thank you for your comprehensive reply. I did not understand everything, but for me personally this passage was the most important:

blackmickey1007 wrote:

RIFE only supports RGB format, but VapourSynth Filter does not support RGB format output, so you must convert RGB format to YUV format.

Did I understand correctly that you want to watch 4K UHD HDR video (10bit) interpolated with RIFE?

As I understood it correctly, such a conversion, would give better quality and resolution than interpolating the same material from a 1080p YUV420 source, part of whose information is encoded in 540p anyway.

By converting from 4K UHD HDR video (10bit) you want to preserve 1080p throughout the chain?

blackmickey1007, could you explain to me how exactly this works with RIFE?

I found this: https://github.com/n00mkrad/flowframes/issues/123 and I think this is the same thing you write about.

I'll try to describe how I understand it, and you can correct me if I'm wrong somewhere:


Real-time interpolation:

Step 1: RIFE converts all original frames from YUV420 to RGBS - lossless conversion for colours.

Step 2: RIFE adds the interpolated frames - no effect on colours.

Step 3... and here I have doubts. Encoder for example FFmpeg converts all original and these interpolated frames from RGBS to YUV420 and here is an obvious loss of information about colours.

...but how is it with real-time interpolation? After all, it is standard for a PC to send an image in RGBS format to the monitor from a word processor, for example, so that we have clear black text on a white background without colour artefacts.

Can't RIFE, or rather VapourSynth (ncnn Vulkan) send frames in RGBS format directly to the monitor? I'm not familiar with coding, so I don't understand everything about the RIFE filter for VapourSynth (ncnn Vulkan) code: https://github.com/HomeOfVapourSynthEvo … cnn-Vulkan

I know that later information from this filter is captured by SVP and passed to mpv. Is it there that some other conversion happens that affects the colour change?

My desire to use RIFE for real-time interpolation rather than encoding comes from the fact that I would like to retain as much information from the original video as possible, including colour information.

I see that you have a lot of knowledge about colour conversion and maybe you can explain to me what is needed and at what stage to enjoy the interpolation quality of the RIFE filter for VapourSynth (ncnn Vulkan) in real time without losing the original colour information?

RIFE Model 4.0 Benchmarks


720p - x2 interpolation - RIFE filter for VapourSynth (ncnn Vulkan):

70fps - NVIDIA GeForce RTX 2060 Mobile - Chainik
https://www.svp-team.com/forum/viewtopi … 158#p80158
92.956fps - NVIDIA GeForce RTX 2070 - blackmickey1007
https://www.svp-team.com/forum/viewtopi … 219#p80219
188.2fps - NVIDIA GeForce RTX 3070 Ti - dlr5668
https://www.svp-team.com/forum/viewtopi … 163#p80163


720p - x2 interpolation - RIFE filter for VapourSynth (PyTorch CUDA):

54.115fps - NVIDIA GeForce RTX 2070 - blackmickey1007
https://www.svp-team.com/forum/viewtopi … 219#p80219
82.4fps - NVIDIA GeForce RTX 3070 Ti - dlr5668
https://www.svp-team.com/forum/viewtopi … 691#p79691
91.3fps - NVIDIA GeForce RTX 3080 Ti - Quaternions
https://www.svp-team.com/forum/viewtopi … 727#p79727


1080p - x2 interpolation - RIFE filter for VapourSynth (PyTorch CUDA):

42.5fps - NVIDIA GeForce RTX 3070 Ti - dlr5668
https://www.svp-team.com/forum/viewtopi … 699#p79699
45.0fps - NVIDIA GeForce RTX 3080 Ti - Quaternions
https://www.svp-team.com/forum/viewtopi … 719#p79719

-----
720p is original demo video from the creator of RIFE at: https://github.com/hzwer/arXiv2020-RIFE
720p (1280x720), 25FPS, 53 s 680 ms, 4:2:0 YUV, 8 bits
direct link: https://drive.google.com/file/d/1i3xlKb … sp=sharing

1080p is an arbitrary file chosen

dlr5668 wrote:

https://i.imgur.com/uvttbqJ.png

demo 1 720p file - v4 model - 3070ti from https://www.svp-team.com/forum/viewtopi … 721#p79721

Thanks for the test! Amazing performance boost!

Interpolation x2 and 188.2fps means 94.1 original frames and 94.1 interpolated frames per second. This means that we can have real-time x4 interpolation without any problem - 25 original frames and 75 interpolated frames!!!!!

Am I counting correctly?

Dlr5668, have you tried x3 and x4 realtime interpolation with this new RIFE filter?
Does the "gpu_thread" setting affect fps, %GPU, VRAM?

dlr5668 wrote:

Can we enable rife profile as default ? I want to use it for <1080p content. Regular rules dont work

I join in with your request. I would also love to use such a setting in the future.

Thanks, this 70fps is not only hugely impressive compared to the previous 8fps, but also to the max results given here earlier:


720p - SVP & RIFE filter for VapourSynth (PyTorch CUDA) Model 4.0:

82,4fps NVIDIA GeForce RTX 3070 Ti - dlr5668
https://www.svp-team.com/forum/viewtopi … 691#p79691
91.3fps NVIDIA GeForce RTX 3080 Ti - Quaternions
https://www.svp-team.com/forum/viewtopi … 727#p79727


1080p - SVP & RIFE filter for VapourSynth (PyTorch CUDA) Model 4.0:

42.5fps NVIDIA GeForce RTX 3070 Ti - dlr5668
https://www.svp-team.com/forum/viewtopi … 699#p79699
45.0fps NVIDIA GeForce RTX 3080 Ti - Quaternions
https://www.svp-team.com/forum/viewtopi … 719#p79719

Chainik wrote:

SVP updated.

rtx 2060, 720p: ~70 fps with v4 model, ~35 fps with v3 model (was ~8 fps before)
real time: works in mpv but for some unknown reason doesn't work well in MPC-HC + Vapoursynth Filter (only gives 0.25 SVP index)

70fps SVP & RIFE filter for VapourSynth (PyTorch CUDA) or
SVP & RIFE filter for VapourSynth (ncnn Vulkan)?

Thanks for the tests and update smile

Thanks!!! Great news!!!

Chainik, could you check how much closer with SVP and the new VapourSynth-RIFE-ncnn-Vulkan filter?

You gave us the results for the base RIFE ncnn Vulkan 20220330:

Chainik wrote:

> original demo video from the creator of RIFE

RTX 2060

model 3.1
20210520 - 4:30
20220228 - 4:30
20220313 - 2:30
20220330 - 1:10

model 4.0
20220228 - 1:05
20220330 - 0:54

model 4.0 also loads CPU a lot (30% of 4800H in the Task manager)


Which gives us the following fps result:

original demo video from the creator of RIFE at: https://github.com/hzwer/arXiv2020-RIFE
720p (1280x720), 25FPS, 53 s 680 ms

25FPS*53.68=1342frames

1342frames*2/54s=49.7fps


Chainik, what results do you get out on your hardware in case of:

1. SVP & RIFE filter for VapourSynth (PyTorch CUDA) re-encoding with x2 interpolation; RIFE model: 4.0; scale=1.0; FP16 or FP32 (whichever is faster)

2. SVP & RIFE filter for VapourSynth (ncnn Vulkan) re-encoding with x2 interpolation; RIFE model: 4.0

I am very much asking you for these tests, please. It's important that we have a comparison of these two tests along with your earlier result, 54 seconds (49.7fps) on the same hardware.

Is there any difference when you change:

gpu_thread

What are the impressions with:

multiplier

set to 3 for the 4.0 model?

Chainik, the results you presented are very promising in terms of using a lighter version of RIFE together with SVP for real-time motion interpolation.

Now I am intrigued by an anomaly that has come to light through testing on this thread:

Flowframes:

lwk7454 wrote:

I've tried Vulkan implementation too, but it only has RIFE 3.1 instead of 3.8:
720p, FP16
FPS: 36.79
Compute_1: 98%
Miracles: 0%
720p, FP32.
FPS: 38.85
Compute_1: 98%
Cuda: 0%

Interesting to see Flowframes RIFE with Vulkan has much better performance than SVP.

https://www.svp-team.com/forum/viewtopi … 531#p79531


SVP:

lwk7454 wrote:

I've also tried TTA Disabled for comparison, just 1 test:
720p, FP32
FPS: 25.1
Compute_1: 100%
Cuda: 15%

https://www.svp-team.com/forum/viewtopi … 526#p79526


Flowframes probably uses RIFE ncnn Vulkan directly: https://github.com/nihui/rife-ncnn-vulkan

SVP on the other hand uses RIFE filter for VapourSynth, based on RIFE ncnn Vulkan: https://github.com/HomeOfVapourSynthEvo … cnn-Vulkan

I don't know if this is the reason for the difference of about 50% more performance of Flowframes vs SVP or maybe just some settings are chosen differently. Anyway I have a big request to you to check how it looks like for you. We already have the results of the test you did for the base RIFE ncnn Vulkan tool:

Chainik wrote:

RTX 2060

3.1 model
20210520 - 4:30

https://www.svp-team.com/forum/viewtopi … 140#p80140

If we had comparison data for the following 3 variants, it would be easier to predict where the difference comes from:

4:30 RTX 2060 model 3.1 RIFE ncnn Vulkan 20210520
?:?? RTX 2060 model 3.1 SVP+RIFE filter for VapourSynth, based on RIFE ncnn Vulkan
?:?? RTX 2060 model 3.1 Flowframes

The free version of Flowframes still probably uses the 3.1 RIFE model.

I know, in light of new testing this old model is now obsolete, but it is still the basis of the RIFE filter for VapourSynth (ncnn Vulkan). If it turns out that there are actually differences between the results of these 3 tests, I think you could figure out from the code where the difference arises in the software. If it's the RIFE filter for VapourSynth, then we could ask HolyWu to include possible changes in a new update: https://github.com/HomeOfVapourSynthEvo … /issues/14

Chainik, is it possible to swap the model to 4.1 instead of 4.0 and do the test also on the latest model? I have made a request here: https://github.com/nihui/rife-ncnn-vulkan/issues/43
but we will probably wait a bit for an update.

Chainik wrote:

> original demo video from the creator of RIFE

RTX 2060

model 3.1
20210520 - 4:30
20220228 - 4:30
20220313 - 2:30
20220330 - 1:10

model 4.0
20220228 - 1:05
20220330 - 0:54

model 4.0 also loads CPU a lot (30% of 4800H in the Task manager)

Many thanks to Chainik for the tests! Even more so, for doing more tests and capturing the differences between the different versions.

Amazing progress!

So, if anyone had the knowledge and skills on how to run RIFE ncnn Vulkan and compare performance I would greatly appreciate it.

RIFE ncnn Vulkan versions for comparison:

1. Release 20210520: https://github.com/nihui/rife-ncnn-vulk … g/20210520
RIFE 3.1

This version is based on the current VapourSynth filter: https://github.com/HomeOfVapourSynthEvo … cnn-Vulkan used by SVP

2. Release 20220330: https://github.com/nihui/rife-ncnn-vulk … g/20220330
RIFE 4.0

It is important to use both versions to get an idea of how much the performance of the new version of the VapourSynth filter should increase by, which we request here: https://github.com/HomeOfVapourSynthEvo … /issues/14


The file is the comparisons: the same one on which tests have already been performed several times on this thread:

original demo video from the creator of RIFE at: https://github.com/hzwer/arXiv2020-RIFE
720p (1280x720), 25FPS, 53 s 680 ms, 4:2:0 YUV, 8 bits
direct link: https://drive.google.com/file/d/1i3xlKb … sp=sharing

dlr5668 wrote:
UHD wrote:

A few hours ago:

RIFE ncnn Vulkan - Release 20220330:
https://github.com/nihui/rife-ncnn-vulk … g/20220330

"update ncnn with nvidia tensorcore optimization"

I'm extremely curious, how much of a performance boost can this give?

If it could manage to get performance close to vs-rife: https://github.com/HolyWu/vs-rife that would be something great smile

Dont think it will be faster than cuda version


In the case of the GeForce RTX 3090 graphic card, using CUDA Cores gives similar processing power for FP32 and FP16:

35.6 Peak FP32 TFLOPS (non-Tensor)
35.6 Peak FP16 TFLOPS (non-Tensor)

However, using Tensor Cores gives much more processing power than CUDA Cores and the difference between FP16 and FP32 is already double!

142/284 Peak FP16 Tensor TFLOPS with FP16 Accumulate
71/142 Peak FP16 Tensor TFLOPS with FP32 Accumulate

Data source - page 44:
https://images.nvidia.com/aem-dam/en-zz … per-V1.pdf


The same GPU gives similar performance results for Flowframe for both FP16 and FP32 precision using the RIFE CUDA/PyTorch version. Also using the old RIFE ncnn/Vulkan version there is no clear performance difference between FP16 and FP32 precision.
https://www.svp-team.com/forum/viewtopi … 531#p79531

From this I conclude that both earlier versions of RIFE are based on the processing power of CUDA Cores.

According to the Ampere architecture based graphics cards specification I quoted above Tensor Cores should give better performance.

Of course, this is the theory. Practice should verify these assumptions. However, any performance increase will be welcomed smile

Ante85 wrote:

I have tried all RIFE model with the program "Flowframes" and know that 3.8 is much better than all the others. Model 4 is fast, but comes with much more artifacts than 3.8 for example.

Try the 4.1 model, and if the result is worse than 3.8 then report this particular case to the RIFE developer, as here: https://github.com/hzwer/Practical-RIFE/issues/10


Ante85 wrote:

I like SVP much better than Flowframes when using RIFE, because with SVP the interpolation starts instantly, so I not have to wait for the program to identifying scen changes, extracting every frame in the movie, and then interpolating those frames on disk so I end upp with some million pictures on disk for every movie I want to interpolate. That is to much hammering on the SSD.

For millions of pictures it is best to use a RAM Disk, such as OSFMount:
https://chiaforum.com/uploads/default/original/2X/9/968639ce0ccef0ba373c7d9749a0909a9fc5ec57.png.


Ante85 wrote:

I also want to know if anybody have find a way to make RIFE in SVP work with 4k and not only 1080p, because I know that RIFE is able to handle that, because in Flowframes it works. When I try to use RIFE in SVP, it produces vertical lines in the video when transcoding 4K material, but only half of the screen gets this vertical artifacts.

4K and FP16 is a known issue: https://github.com/hzwer/arXiv2021-RIFE/issues/188


Ante85 wrote:

I managed to Install the Cuda capable RIFE in SVP but I have not understood what was really happening and why this can not be the default RIFE and so on. 6 gb pytorch is not normal? :-)))

Yes, it is normal. The RIFE ncnn/Vulkan version takes up much less space, but is much slower. At least it was. There is hope for a speedup, but it's not clear how big. See next post...

And since a few days we also have a new v4.1 model: https://github.com/hzwer/Practical-RIFE

Has anyone already tested the interpolation quality of the new model and its performance?

A few hours ago:

RIFE ncnn Vulkan - Release 20220330:
https://github.com/nihui/rife-ncnn-vulk … g/20220330

"update ncnn with nvidia tensorcore optimization"

I'm extremely curious, how much of a performance boost can this give?

If it could manage to get performance close to vs-rife: https://github.com/HolyWu/vs-rife that would be something great smile

Chainik wrote:

if you think that asking the same question ten times will change anything - you're wrong wink

Of course, if you perceived my posts as time pressure then I sincerely apologise.

I am keen to provide feedback to the RIFE developer as soon as possible on the new 4.0 model in terms of performance and CUDA core loading on the latest graphics cards. I posted my request to the RIFE developer when the 3.8 model was available: https://github.com/hzwer/arXiv2020-RIFE/issues/217 However, we now have the 4.0 model which, as the tests posted here show, is a colossal leap in performance.

I am afraid that if there is no feedback, the RIFE developer will close the thread, concluding that since he added a new model and there are no comments, the issue is closed. So far we have tests done by Quaternions, which shows that paradoxically the 4.0 model is not only faster, but also probably less demanding on CUDA cores than the 3.8 model: https://www.svp-team.com/forum/viewtopi … 727#p79727

I write probably, because we have the test results of the 720p RIFE demo file  showing the following load on the CUDA cores:

Cuda: 58%, FPS: 69.8, FP32, Model 3.8, NVIDIA GeForce RTX 3090
https://www.svp-team.com/forum/viewtopi … 723#p79723

Cuda: 46%, FPS: 91.3, FP32, Model 4.0, NVIDIA GeForce RTX 3080 Ti
https://www.svp-team.com/forum/viewtopi … 727#p79727

Because the tests were done on two different computers and two different graphics cards, we can't be entirely sure of a similar conclusion when testing different models on the same hardware. This is why I asked 3 people on this thread to test both models or completing the test of the missing model.

That said, the 3080 Ti and 3090 cards are pretty close in performance and the 12% difference in CUDA load between the two different RIFE models is likely to be confirmed on tests on the same hardware.  This would mean that despite the optimisation of the 4.0 model, the potential for further optimisation has paradoxically increased even more as the CUDA load has decreased, which as you can see is not the bottleneck here.

Now the most important thing: 91.3 fps indicates that we are very close to x3 interpolation of 720p files in real time! Perhaps this is already possible if x3 interpolation uses some synergy effect whether better parallel processing, lower VRAM bandwidth needs, or better use of CUDA processing power. Without testing we don't know that. And for testing we need your help Chainik and adding this parameter https://github.com/HolyWu/vs-rife/blob/ … t__.py#L20 to work with SVP.

With full x2 and x3 test results we will give a full feedbeck to the RIFE developer and maybe he can find a way to make it even more effective by making better use of CUDA cores and maybe even Tensor cores.

Chainik wrote:

if you think that asking the same question ten times will change anything - you're wrong wink

I know, but interpolating x10 with SVP&vs-rife on this 85" 4K 240Hz anti-reflection TV panel: https://www.displayspecifications.com/en/news/2ff1668 will change everything big_smile

Chainik wrote:

Instructions simplified (https://www.svp-team.com/forum/viewtopi … 695#p79695) cause Python 3.9 and Vapoursynth R57 are now installed/updated with SVP.

Thanks smile

What about the following parameter?
https://github.com/HolyWu/vs-rife/blob/ … t__.py#L20

Quaternions wrote:

I was able to get the CUDA graph to show up by disabling GPU hardware scheduling, the 3D reporting is very different from when GPU hardware scheduling was turned on but here is my results:

Thanks a lot for the tests smile

Would you be able to repeat these tests or at least one test - demo clip full (FP32) on 3.8 model with CUDA graph? I don't know if you have this version of RIFE.

dlr5668 wrote:

Playing a game or background virtual machine uses GPU

OK, thanks, now I understand.

dlr5668 wrote:

RTX3000 rife cuda load should be at 50% max since

Here are the results showing 66% maximum CUDA load for the 3090 and the 3.8 model:
https://www.svp-team.com/forum/viewtopi … 526#p79526

If you could do similar tests on your card for model 3.8 and 4.0, we would have a confirmation how the CUDA load has changed compared to the old model on the same graphics card. If in fact in addition to the performance increase the CUDA usage has decreased, it means that the potential for further optimization of the 4.0 model is even greater than for the 3.8 model and that the bottleneck is not CUDA, but something else entirely, as I wrote about here:  https://github.com/hzwer/arXiv2020-RIFE/issues/217

Then we could add the results of such tests to the above post and maybe the RIFE developer will find a way to optimize. I think we should all care about that.

And here are the details of the test:

Fixed test parameters:

SVP & RIFE filter for VapourSynth (PyTorch)
re-encoding with x2 interpolation
scale=1.0

Variable test parameters:

Math precision: FP16 and FP32
RIFE model: 4.0 and 3.8

Test results:

re-encoding speed [FPS]
CUDA utilisation [%]

Video file:

original demo video from the creator of RIFE at: https://github.com/hzwer/arXiv2020-RIFE
720p (1280x720), 25FPS, 53 s 680 ms, 4:2:0 YUV, 8 bits
direct link: https://drive.google.com/file/d/1i3xlKb … sp=sharing

dlr5668 wrote:

I also reuploaded image. Almost good to distribute with svp installer like regular RIFE wink You can remove all .lib and *train* files from pytorch + apply NTFS compress to reduce unpacked

https://i.imgur.com/gljSMfI.png


I think it's worth it...

Quaternions wrote:

4k runs at 11.6fps which is much faster than the 1.9fps I got without cuda

and this...

https://github.com/HolyWu/vs-rife/blob/ … t__.py#L20

lwk7454 wrote:

Here are the test results:

Parameters:
Test-Time Augmentation: Enabled [sets RIFE filter for VapourSynth (PyTorch)]
re-encoding with x2 interpolation
RIFE model: 3.8
scale=1.0
Encoder: NVIDIA NVENC H.264

720p, FP16
FPS: 63.5
Cuda: 56%

720p, FP32
FPS: 69.8
Cuda: 58%

1080p, FP16
FPS: 26.9
Cuda: 62%

1080p, FP32
FPS: 28.1
Cuda: 66%

I've also tried TTA Disabled for comparison, just 1 test:
720p, FP32
FPS: 25.1
Compute_1: 100%
Cuda: 15%


lwk7454, could you do the same tests for the latest 4.0 model?

The links to the files you tested are in this post of mine: https://www.svp-team.com/forum/viewtopi … 525#p79525

Unfortunately dlr5668 deleted the 1080p file he used to test, but I guess any 1080p file should give similar results.

I will of course forward the results to the RIFE creator here:
https://github.com/hzwer/arXiv2020-RIFE/issues/217

dlr5668 wrote:

Yep. I use my PC as server for hyper-v machines so I cant change GPU driver or install CUDA toolkit

4.0 is also better under GPU load. 3.8 dropped my encode fps to 16 and 4.0 maintains 30-35 with less VRAM. Crazy magic

Thanks for the new information dlr5668. I just don't know where the drop in fps to 30-35 comes from? Is the earlier result some kind of GPU boost?

I also have a question, would you have the ability to do the above test I ask for Quaternions in post above? I am keen to know the performance information in fps combined with GPU load (CUDA)

I don't think you need to install the CUDA toolkit. It is probably enough to disable hardware scheduling, as lwk7454 wrote about: https://www.svp-team.com/forum/viewtopi … 497#p79497

You could probably find that the 3D load will equal the CUDA load. That's what I think, after what lwk7454 described, but I'm not sure. Unfortunately, it's hard to write something when I don't have a proper graphics card myself yet.

I don't know if lwk7454 is reading this thread and will be able to do some more testing.That's why it would be good if you could do a similar test that I'm writing about for both the 3.8 and 4.0 models. The idea is to be consistent in the data and see how the computing power utilization as a percentage changed between these models on the same graphics card.  The load on the 3090 and 3070 Ti cards can be quite different.

Quaternions wrote:

unfortunately the interpolated video has striped artifacts outside of a 2048x2048 area

Did you use FP32 or FP16? What you write is a known problem for FP16:
https://github.com/hzwer/arXiv2020-RIFE/issues/188


Quaternions wrote:

3080 ti is very close to real time speed with cuda for 1080p

Thanks a lot for the testing and feedback on the new model. In fact the latest graphics cards are one step away from interpolating 1080p files in real time using RIFE. That is why our feedback to the RIFE developer is so important. Maybe something more can be squeezed out of RIFE?

I have written a request for help in optimizing RIFE for the latest graphics cards. You can find the details at this link: https://github.com/hzwer/arXiv2020-RIFE/issues/217

Now I need one or even better two or three people to check the GPU (CUDA) load during interpolation. I would be very grateful if you could do a simple test including CUDA load:


And here are the details of the test:

Fixed test parameters:

SVP & RIFE filter for VapourSynth (PyTorch)
re-encoding with x2 interpolation
RIFE model: 4.0
scale=1.0

Variable test parameters:

Math precision: FP16 and FP32

Test results:

re-encoding speed [FPS]
CUDA utilisation [%]

Video file:

original demo video from the creator of RIFE at: https://github.com/hzwer/arXiv2020-RIFE
720p (1280x720), 25FPS, 53 s 680 ms, 4:2:0 YUV, 8 bits
direct link: https://drive.google.com/file/d/1i3xlKb … sp=sharing


If the CUDA load is not displayed then you can check my post here: https://www.svp-team.com/forum/viewtopi … 493#p79493 and especially lwk7454's reply here: https://www.svp-team.com/forum/viewtopi … 497#p79497


If, in addition, you could find the time to do a similar test with some 1080p file then all the better and more data for the RIFE developer to analyse.


Quaternions wrote:

All in all very cool, I will definitely be watching anything 720p with rife

Once we can achieve x2 real-time interpolation for 1080p files, we will be able to simultaneously interpolate x3 real-time 720p files. So it's worth testing and giving precise data to the RIFE developer. When he sees how close it is I think he will find a way to optimize RIFE even more.