Chainik wrote:if you think that asking the same question ten times will change anything - you're wrong
Of course, if you perceived my posts as time pressure then I sincerely apologise.
I am keen to provide feedback to the RIFE developer as soon as possible on the new 4.0 model in terms of performance and CUDA core loading on the latest graphics cards. I posted my request to the RIFE developer when the 3.8 model was available: https://github.com/hzwer/arXiv2020-RIFE/issues/217 However, we now have the 4.0 model which, as the tests posted here show, is a colossal leap in performance.
I am afraid that if there is no feedback, the RIFE developer will close the thread, concluding that since he added a new model and there are no comments, the issue is closed. So far we have tests done by Quaternions, which shows that paradoxically the 4.0 model is not only faster, but also probably less demanding on CUDA cores than the 3.8 model: https://www.svp-team.com/forum/viewtopi … 727#p79727
I write probably, because we have the test results of the 720p RIFE demo file showing the following load on the CUDA cores:
Cuda: 58%, FPS: 69.8, FP32, Model 3.8, NVIDIA GeForce RTX 3090
https://www.svp-team.com/forum/viewtopi … 723#p79723
Cuda: 46%, FPS: 91.3, FP32, Model 4.0, NVIDIA GeForce RTX 3080 Ti
https://www.svp-team.com/forum/viewtopi … 727#p79727
Because the tests were done on two different computers and two different graphics cards, we can't be entirely sure of a similar conclusion when testing different models on the same hardware. This is why I asked 3 people on this thread to test both models or completing the test of the missing model.
That said, the 3080 Ti and 3090 cards are pretty close in performance and the 12% difference in CUDA load between the two different RIFE models is likely to be confirmed on tests on the same hardware. This would mean that despite the optimisation of the 4.0 model, the potential for further optimisation has paradoxically increased even more as the CUDA load has decreased, which as you can see is not the bottleneck here.
Now the most important thing: 91.3 fps indicates that we are very close to x3 interpolation of 720p files in real time! Perhaps this is already possible if x3 interpolation uses some synergy effect whether better parallel processing, lower VRAM bandwidth needs, or better use of CUDA processing power. Without testing we don't know that. And for testing we need your help Chainik and adding this parameter https://github.com/HolyWu/vs-rife/blob/ … t__.py#L20 to work with SVP.
With full x2 and x3 test results we will give a full feedbeck to the RIFE developer and maybe he can find a way to make it even more effective by making better use of CUDA cores and maybe even Tensor cores.