NM64 wrote:It may have been easier if you just used the newly-developed rusticl paired with the standard default kernel-level AMD graphics drivers and mesa 22.3:
https://lists.freedesktop.org/archives/ … 25898.html
I just never said much of anything about it because I've not managed to test it out myself just yet and therefore I cannot say from experience what the exact steps are to enable rusticl for AMD graphics considering that rusticl as a whole is disabled by default.
That won’t work either. With rusticl, SVP will break for all open source drivers in the foreseeable future. I happened to get to the point with preliminary radeonsi support in rusticl, which currently advertises OpenCL 1.2. SVP won’t detect it though (the tool itself works in a shell), because it links against its own mesa and especially LLVM, but let’s start from the beginning.
To hunt down that LLVM error, I looked up the source and eventually found the related bug [1]. The error comes down to libllvm being linked to twice and is triggered by the CommandLine module. So I went ahead, commented the fatal errors out, recompiled LLVM and the whole system afterwards. This resulted in… nothing. The message persists, so it is coming from LLVM linked to by SVP. The main difference between OpenCL use and non-OpenCL use revealed by gdb looking at mpv is simply the linking to libllvm, which means, that the libsvpflow libraries already statically link against it and the dynamic load is the second one, which leads to the abort. This also explains, why the SVP performance test works for OpenCL.
Afterwards I prepared a Gentoo mesa ebuild update for rusticl support. While I can’t test rusticl actually because of the lacking OpenCL support with radeonsi, I now know definitely, that it also links to libllvm, so it will be broken with current SVP linking.
I see about 3 solutions:
1. Link dynamically to the system libs, at least for libllvm.
2. Request LLVM upstream to make a change to ignore double linkage, if that works. If it does not work, ask for suggestions to get this software constellation to work.
3. Remove the dependency on libllvm, if possible. If the Manager’s GUI libraries are the sole consumers, provide a CLI tool, which can act as daemon, read the config and enable the plugin on video playback. SVP overlay is not mandatory beyond the configuration phase.
Edit:
And a 4th:
Compile mesa with Dgallium-opencl (clover) support or install the corresponding package for your distribution. That prevents the error for whatever reason. RocM is still used as OpenCL backend, because unmerging it leads to OpenCL not being detected in SVP. Anyway while Clover is not used in any way, it has to be present for SVP.
Maybe the linking is changed in system mesa lib or SVP loads system libs depending on certain symbols being present in them, which prevents the LLVM error from being thrown, but I’m just guessing. Considering these facts, rusticl might in fact be a solution. Note, that Clover is going to be removed upstream, so if rusticl will not work, we still have a problem.
Edit2:
Nope, Clover was not the fix. Actually RocM broke after restarting the X server, reproducing the LLVM error, like it is warned in the Gentoo Wiki. That’s the error, which should have been patched out on my system. So it seems, the patch did not work, which breaks my assumptions about SVP linking. So, actually I did not change anything, but it started to work after ~2 years and a system update, which did not contain mesa nor RocM, but at least mpv. Maybe it was mpv, a RocM update two weeks ago or a patch in Gentoo, which others are missing. Versions:
Rocm-5.3.3
mesa-22.2.3
opencl-icd-loader-2022.09.30
mpv-0.35.0-r1
Sorry for not having any clues to fix this one.
The last god damned edit:
Found it. The double linking occurs with radeonsi’s vaapi driver in mesa and rocm-comgr:
$ ldd /usr/lib64/va/drivers/radeonsi_drv_video.so
...
libLLVM-15.so => /usr/lib/llvm/15/lib64/libLLVM-15.so (0x00007f933e600000)
...
$ ldd /usr/lib64/libamd_comgr.so.2.4
...
libLLVM-15.so => /usr/lib/llvm/15/lib64/libLLVM-15.so (0x00007f933e600000)
...
I missed an occurrence of the report_fatal_error call in my LLVM patch, which has been fixed. Providing the updated patch for reference. Recompiling mesa and rocm-comgr with this patched LLVM resolves the issue. The patched LLVM now prints a warning instead of aborting with an error, which otherwise seems to have zero side effects.
So to summarize:
1. You can either disable hardware decoding, which should get OpenCL to work or disable SVP’s OpenCL use, which should allow hardware decoding.
2. You can patch LLVM and compile mesa and rocm-comgr with it, which gets both to work.
3. You can be angry with LLVM developers for ignoring an issue for ≈8 years, which could have been solved with a 17 lines patch.
4. You’re welcome.
[1]: https://github.com/llvm/llvm-project/issues/23326