movie *3 :
Thanks a lot! 4K HDR x3 in real time with RIFE!!! Unbelievable!!!
And I so wanted to save on RAM and CPU
Do you see any colour difference on the HDR screen watching this demo without interpolation and with RIFE interpolation?
You are not logged in. Please login or register.
SmoothVideo Project → Posts by UHD
movie *3 :
Thanks a lot! 4K HDR x3 in real time with RIFE!!! Unbelievable!!!
And I so wanted to save on RAM and CPU
Do you see any colour difference on the HDR screen watching this demo without interpolation and with RIFE interpolation?
I replicated the exactly same setting as shown by Pezede for transcoding -> frames by 5 for a 24p 1080p movie, lossless preset etc
Ryzen 5600x, 3800 MHz Memory -> starts with about 190fpsdoing the same for a 4k uhd movie -> 32,8fps
same for the LG New York Demo clip, 32 fps
Is your 1080p also 1920x1080, because that matters too?
UHD wrote:DragonicPrime wrote:Just tried the updated instructions. Getting around 50fps at 4k with an RTX 4090 now. Between 170-190fps(it kept going up and down for some reason) at 1080p. These improvements are huge. Thanks for updating the instructions
Could you check if the video below can now be interpolated in real time with RIFE?
Second question: is it possible to preserve the 10 bit colour depth and HDR when interpolating in real time with RIFE the video below?
In other words, could you compare the colours of the video below played back without interpolation and with RIFE interpolation.LG 4K HDR Demo - New York.ts
File size: 448 MiB
Duration: 1 min 13 sec
Overall bit rate: 51.4 Mbps
HDR format: SMPTE ST 2086, HDR10 compatible
Width: 3 840 pixels
Height: 2 160 pixels
Frame rate: 25.000 FPS
Color space: YUV
Chroma subsampling: 4:2:0
Bit depth: 10 bitsDirect link: https://drive.google.com/file/d/1dfR5TT … _bGfEXUvJ/
Source: http://hdr4k.blogspot.com/4k HDR still doens't work in real time. I just updated my previous message as well. 4K SDR seems to work with no problems in real time though. 4K with HDR, I only get around 35-40fps
Pezede, would you please check the above demo in real time 4K HDR and fps, we would have an interesting comparison.
I've done a reinstall of SVP and I'm now getting ~280fps on 1080P transcoding with the new guide, that seems almost miraculous...
I've gotten the console window and there are lock files in the rife folder so it seems to be used.
Hardware is 4090 paired with a 7950X and DDR5 6000 ram.
so CPU and RAM really matter!
On my second monitor which is just 1080p, MPV is downscaling it, so it runs perfectly
The question is whether MPV downscaling in real time to 1080p preserves the 10-bit colour depth and HDR when interpolating in real time with RIFE.
In other words, could you compare the colours of the video below played back without interpolation and with RIFE interpolation.
and what interpolation factor (x2, x3, x4) is possible with such a 1080p 10-bit HDR file:
LG 4K HDR Demo - New York.ts
compared to 1080p 8-bit?
will do this soon .. currently generating the engine files for 4k/UHD resolution
If you can please also test this file that I ask DragonicPrime in the above post. I'm very curious, as I'm also planning to buy a 4090 graphics card. If two people succeed, it will be the best confirmation of the capabilities of these highest performance graphics cards.
Just tried the updated instructions. Getting around 50fps at 4k with an RTX 4090 now. Between 170-190fps(it kept going up and down for some reason) at 1080p. These improvements are huge. Thanks for updating the instructions
Could you check if the video below can now be interpolated in real time with RIFE?
Second question: is it possible to preserve the 10 bit colour depth and HDR when interpolating in real time with RIFE the video below?
In other words, could you compare the colours of the video below played back without interpolation and with RIFE interpolation.
LG 4K HDR Demo - New York.ts
File size: 448 MiB
Duration: 1 min 13 sec
Overall bit rate: 51.4 Mbps
HDR format: SMPTE ST 2086, HDR10 compatible
Width: 3 840 pixels
Height: 2 160 pixels
Frame rate: 25.000 FPS
Color space: YUV
Chroma subsampling: 4:2:0
Bit depth: 10 bits
Direct link: https://drive.google.com/file/d/1dfR5TT … _bGfEXUvJ/
Source: http://hdr4k.blogspot.com/
I deleted SVP and startet from scratch, just to check if the inscructions are complete and everything is working.
There is a step missing.
After replacing generate.js and base.py, start SVP4, add the new option TensorRT etc.Then the missing step:
Copy the Rife AI profile and select the AI Model "rife"
Enable the new Option TensorRT On
This is a good solution. Start from scratch and describe all the steps that are missing.
In other words, create some instruction for a completely new person so that they do not get lost.
GOOD NEWS EVERYONE!
updated instructions
should improve FPS on 4080-and-better (probably 4070/3080 too, dunno), when performance is bound by the system's RAM bandwidth, not GPU power
i.e. for 4K playback
not sure what you're doing, but it's OK even on a 2060 laptop now
What exactly did you do? I'm very curious to know what solved the memory problems. 1080p real time with RIFE using 2060 laptop is impressive!
In my opinion, the more options are tested and the more test details are given the better. As for example in this already quite old post:
### Environment ###
Windows 10
DDR4-2933 48GiB
Nvidia RTX2070 8GiB
Nvidia Driver 511.79
CUDA Toolkit 11.3
cuDNN v8.2.1 (June 7th, 2021), for CUDA 11.x### Software ###
Python 3.10.4
VapourSynth R58-RC2
PyTorch 1.11.0 (CUDA 11.3)
vs_rife v2.0.0
VapourSynth-RIFE-ncnn-Vulkan r3 (model: 4.0)### Tools & Seting ###
GPU-Z 2.45.0
VapourSynth Editor r19-mod-5-AC2
VapourSynth threads: core.num_threads = 4
Decoder: lsmas.LWLibavSource(format="yuv420p8", prefer_hw=3)
Video: demo.mp4 [720p]### Result ###
1. RIFE filter for VapourSynth (PyTorch CUDA) - vs_rife v2.0.0
Interpolation: x2
RIFE model: 4.0
scale: 1.0
FP16: False
FPS: 54.115
CUDA: ~50%
PerfCap: VRel, VOp, Pwr2. RIFE filter for VapourSynth (PyTorch CUDA) - vs_rife v2.0.0
Interpolation: x2
RIFE model: 4.0
scale: 0.5
FP16: False
FPS: 69.997
CUDA: ~40%
PerfCap: VRel, VOp3. RIFE filter for VapourSynth (PyTorch CUDA) - vs_rife v2.0.0
Interpolation: x2
RIFE model: 4.0
scale: 0.5
FP16: True
FPS: 70.936
CUDA: ~32%
PerfCap: VRel, VOp4. RIFE filter for VapourSynth (ncnn Vulkan) - VapourSynth-RIFE-ncnn-Vulkan r3
Interpolation: x2
RIFE model: 4.0
GPU thread: 1
tta: False
uhd: False
sc: True
FPS: 27.356
CUDA: ~1%
Compute_1: 30%
PerfCap: Idle
5. RIFE filter for VapourSynth (ncnn Vulkan) - VapourSynth-RIFE-ncnn-Vulkan r3
Interpolation: x2
RIFE model: 4.0
GPU thread: 2
tta: False
uhd: False
sc: True
FPS: 92.956
CUDA: ~15%
Compute_1: ~94%
PerfCap: VRel, VOp, Pwr
6. RIFE filter for VapourSynth (ncnn Vulkan) - VapourSynth-RIFE-ncnn-Vulkan r3
Interpolation: x2
RIFE model: 4.0
GPU thread: 2
tta: False
uhd: True
sc: True
FPS: 92.366
CUDA: ~15%
Compute_1: ~94%
PerfCap: VRel, VOp, Pwr
7. RIFE filter for VapourSynth (ncnn Vulkan) - VapourSynth-RIFE-ncnn-Vulkan r3
Interpolation: x2
RIFE model: 4.0
GPU thread: 2
tta: False
uhd: False
sc: False
FPS: 87.083
CUDA: ~15%
Compute_1: ~94%
PerfCap: VRel, VOp, Pwr
8. RIFE filter for VapourSynth (ncnn Vulkan) - VapourSynth-RIFE-ncnn-Vulkan r3
Interpolation: x2
RIFE model: 4.0
GPU thread: 3
tta: False
uhd: False
sc: True
FPS: 90.645
CUDA: ~15%
Compute_1: ~94%
PerfCap: Idle
with software transcoding ?
288 new fps (1080p) for 4090+13900k (TensorRT8.5+vs_threads=4+fp16) (rife46) (num_streams=10) (benchmark was done with vspipe file.py -p . instead of piping into ffmpeg and rendering to avoid cpu bottleneck)
164 new fps (1080p) for 4090+5950x (ncnn+2 threads+4 vs threads+ffmpeg (ultrafast) (rife4.6)
Source: https://github.com/styler00dollar/VSGAN-tensorrt-docker
It is best to check and test all options.
UHD
> We are now testingyou are not
I'm testing virtually without a proper graphics card, and this is even more difficult
this screenshot shows software h264 transcoding, if i use the settings of the screenshot my 6core CPU is at 100%, GPU with 2 threads at about 35% utilization and 19fps (4k) or 41 fps (starts above 50 but after some time 41 is stabelized) (1080p)
Use the 'ultrafast' preset and let us know if performance has improved:
https://trac.ffmpeg.org/wiki/Encode/H.264
We are now testing RIFE and looking for bottlenecks
Tested this out with an RTX 4090 and seem to be getting around 115fps on a 1080p video. So much better than the default implementation. Used to only get around 80fps with the default
It is good that there are more 4090 card owners on this forum. It will be easier to compare results
Thanks, will invest later tonight how to apply this do you read the github thread ?
I hope a solution can be found. Looking at what the 3070Ti card can do, I'm very curious to see what will be achieved with the 4090. Bottlenecks will probably appear somewhere and if they can be identified then the potential for performance gains is huge. You are blazing a new trail, the next ones after you will find it easier
it did 10% better
Why not try even faster settings?
ultrafast
superfast
veryfast
faster
flownet_v4.6.pkl_NVIDIA GeForce RTX 3070 Ti_trt-8.5.2.2_1280x768_fp32_workspace-1073741824_scale-1.0_ensemble-False.pt
flownet_v4.6.pkl_NVIDIA GeForce RTX 4090_trt-8.5.2.2_3840x2176_fp32_workspace-1073741824_scale-1.0_ensemble-False
clip = clip.resize.Bicubic(format=vs.RGBS, matrix_in_s="709")
I will be following this thread over the weekend and today at the end I propose to change it:
vs.RGBS
to
vs.RGBH
this should force FP16 precision in vs-rife and double the performance.
I am new to SVP but due to RIFE implementation I ordered a 4090. If you guide me how to test / benchmark I will show all results
I have identical plans to buy a NVIDIA GeForce RTX 4090 also because of RIFE. I hope we can work something out together to make the fastest RIFE filter work in real time.
RTX4090realtime mpv crashes or does not even start
You are already the third person to confirm this problem. Thanks for the tests
the code for mpv.
import vapoursynth as vs
core = vs.core
from vsrife import RIFEclip = video_in
clip = clip.resize.Bicubic(format=vs.RGBS, matrix_in_s="709")
clip = RIFE(clip,trt=True,factor_num=5,factor_den=1)
clip = clip.resize.Bicubic(format=vs.YUV420P8, matrix_s="709")
clip.set_output()
If I understood correctly, exactly the same settings, but without trt=True (or with trt=False) in your case allow for smooth real-time interpolation?
Can you post the settings script you used with mpv?
Something like here: https://github.com/HolyWu/vs-rife/issue … -967073164 but of course together with all the parameters you set for the vs-rife filter
If this vs-rife filter used directly with mpv with setting:
trt=False
allows real-time interpolation of 720p files
and with the setting
trt=True
does not allow real-time interpolation of 720p files, this means that we should report the issue to HolyWu.
Can you post the settings script you used with mpv?
I still don't have a graphics card that allows me to test, but if someone else confirms the same problem then we can report it to HolyWu.
The only point I can see in adding this filter to SVP is that it can interpolate in real time, and faster than using RIFE-ncnn-Vulkan.
I tried vsrife trt before, but it doesn't work in real-time MPV, only when transcoding.
Have you tried directly using this filter https://github.com/HolyWu/vs-rife in real time with mpv even on video with lower resolution, for example 720p?
Thanks for the tests aloola
maybe you should try this? https://github.com/AmusementClub/vs-mlrt/wiki/RIFE
it works fine with mpc and mpv for me, 1080px3 in realtime.
I think we should first try to find the cause of the problems. Particularly since, with real-time interpolation, every frame matters.
vs-rife using TensorRT should be faster than vs-mlrt using TensorRT by at least 36% - https://github.com/HolyWu/vs-rife/discussions/19 :
45.91 fps NVIDIA GeForce RTX 3050 (1080p, FP16, model 4.6, vs-rife using TensorRT)
33.66 fps NVIDIA GeForce RTX 3050 (1080p, FP16, model 4.6, vs-mlrt using TensorRT)
Knowing the performance of graphics cards:
Fourth-generation Tensor Cores - Peak FP16 using the Sparsity feature:
660.6 TFLOPS - NVIDIA GeForce RTX 4090
https://images.nvidia.com/aem-dam/Solut … ecture.pdf
Third-Generation Tensor Cores - Peak FP16 using the Sparsity feature:
174 TFLOPS - NVIDIA GeForce RTX 3070 Ti
https://www.anandtech.com/show/17204/nv … more-money
72.8 TFLOPS - NVIDIA GeForce RTX 3050
https://www.computerbase.de/2022-01/nvi … 3050-test/
we should get the following results, at least in theory with scaling proportional to the increase in performance:
109,73 fps NVIDIA GeForce RTX 3070 Ti (1080p, FP16, model 4.6, vs-rife using TensorRT)
416,60 fps NVIDIA GeForce RTX 4090 (1080p, FP16, model 4.6, vs-rife using TensorRT)
=== RIFE / PyTorch+TensorRT installation ===
Huge thanks Chainik
Increase GPU threads to 2. This will double the performance.
But even then, don't count on much, as the GeForce RTX 3070 Laptop GPU has very limited Tensor Cores capabilities compared to its desktop counterpart: https://en.wikipedia.org/wiki/GeForce_30_series
TensorRT can add another 50% performance:
https://github.com/HolyWu/vs-rife/discu … nt-4117604
SmoothVideo Project → Posts by UHD
Powered by PunBB, supported by Informer Technologies, Inc.