Mind Dump, Tech And Life Blog
written by Ivan Alenko
published under license CC4-BY
posted in category Systems Software / Docker
posted at 06. Jan '23

Howto Setup nVidia GPU Acceleration for Docker in OpenSuse Tumbleweed

This article will show how to get GPU acceleration to work in Docker for Sygil WebUI Stable Diffusion. OpenSUSE Tumbleweed is not supported, but we’ll use packages from SLES.

Steps:

  • remove container packagers if present: zypper remove libnvidia-container-*
  • download https://nvidia.github.io/libnvidia-container/sles15.3/libnvidia-container.repo
  • add repository: zypper ar ~/Downloads/libnvidia-container.repo
  • install zypper install nvidia-docker2
  • accept key
  • press yes on file conflict (but only these two - if there are with libnvidia-container-*, remove them first)
Detected 1 file conflict:

File /etc/docker/daemon.json
  from install of
     docker-20.10.21_ce-1.1.x86_64 (openSUSE-Tumbleweed-Oss)
  conflicts with file from install of
     nvidia-docker2-2.11.0-1.noarch (libnvidia-container)

File conflicts happen when two packages attempt to install files with the same name but different contents. If you continue, conflicting files will be replaced losing the previous content.
Continue? [yes/no] (no): yes
  • that’s all and now you won’t get docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]. when trying to run Sygil WebUI container

The list of packages that are installed:

  • kernel-firmware-nvidia
  • libnvidia-container-tools
  • libnvidia-container1
  • libnvidia-egl-wayland1
  • nvidia-computeG05 (there is already nvidia-computeG06, but haven’t tried yet)
  • nvidia-computeG05-32bit
  • nvidia-container-toolkit
  • nvidia-container-toolkit-base
  • nvidia-docker2
  • nvidia-gfxG05-kmp-default
  • nvidia-glG05
  • nvidia-glG05-32bit
  • x11-video-nvidiaG05
  • x11-video-nvidiaG05-32bit

References:

Deadends

libnvidia-container

You might want to compile https://github.com/NVIDIA/libnvidia-container, but this is a deadend. Readme is kind of outdated, and code too. make docker does not even exist.

And make fails with ld and -soname error (I tried 1.11.0 and 1.12.0.rc3):

damon@rapthalia:~/Downloads/libnvidia-container> make
mkdir -p /home/damon/Downloads/libnvidia-container/deps
make -f /home/damon/Downloads/libnvidia-container/mk/nvidia-modprobe.mk DESTDIR=/home/damon/Downloads/libnvidia-container/deps install
make[1]: Entering directory '/home/damon/Downloads/libnvidia-container'
install -d -m 755 /home/damon/Downloads/libnvidia-container/deps/usr/local/include /home/damon/Downloads/libnvidia-container/deps/usr/local/lib
install -m 644 /home/damon/Downloads/libnvidia-container/deps/src/nvidia-modprobe-495.44/modprobe-utils/nvidia-modprobe-utils.h /home/damon/Downloads/libnvidia-container/deps/src/nvidia-modprobe-495.44/modprobe-utils/pci-enum.h /home/damon/Downloads/libnvidia-container/deps/usr/local/include
install -m 644 /home/damon/Downloads/libnvidia-container/deps/src/nvidia-modprobe-495.44/modprobe-utils/libnvidia-modprobe-utils.a /home/damon/Downloads/libnvidia-container/deps/usr/local/lib
make[1]: Leaving directory '/home/damon/Downloads/libnvidia-container'
make -f /home/damon/Downloads/libnvidia-container/mk/nvcgo.mk DESTDIR=/home/damon/Downloads/libnvidia-container/deps MAJOR=1 VERSION=1.12.0 LIB_NAME=libnvidia-container-go install
make[1]: Entering directory '/home/damon/Downloads/libnvidia-container'
rm -f -rf /home/damon/Downloads/libnvidia-container/deps/src/nvcgo
cp -a -R /home/damon/Downloads/libnvidia-container/src/nvcgo /home/damon/Downloads/libnvidia-container/deps/src/nvcgo
make -C /home/damon/Downloads/libnvidia-container/deps/src/nvcgo VERSION=1.12.0 clean
make[2]: Entering directory '/home/damon/Downloads/libnvidia-container/deps/src/nvcgo'
rm -f -f libnvidia-container-go.so libnvidia-container-go.h
make[2]: Leaving directory '/home/damon/Downloads/libnvidia-container/deps/src/nvcgo'
make -C /home/damon/Downloads/libnvidia-container/deps/src/nvcgo VERSION=1.12.0 build
make[2]: Entering directory '/home/damon/Downloads/libnvidia-container/deps/src/nvcgo'
export CGO_CFLAGS="-std=gnu11 -O2"; \
export CGO_LDFLAGS="-Wl,--gc-sections -Wl,-s -Wl,--soname,libnvidia-container-go.so.1"; \
go build -o libnvidia-container-go.so -ldflags "-s -w" -buildmode=c-shared .
make[2]: Leaving directory '/home/damon/Downloads/libnvidia-container/deps/src/nvcgo'
make -C /home/damon/Downloads/libnvidia-container/deps/src/nvcgo install DESTDIR=/home/damon/Downloads/libnvidia-container/deps
make[2]: Entering directory '/home/damon/Downloads/libnvidia-container/deps/src/nvcgo'
install -d -m 755 /home/damon/Downloads/libnvidia-container/deps/usr/local/lib /home/damon/Downloads/libnvidia-container/deps/usr/local/include/nvcgo
install -m 755 libnvidia-container-go.so /home/damon/Downloads/libnvidia-container/deps/usr/local/lib/libnvidia-container-go.so.1.12.0
install -m 644 libnvidia-container-go.h /home/damon/Downloads/libnvidia-container/deps/usr/local/include/nvcgo/nvcgo.h
install -m 644 ctypes.h /home/damon/Downloads/libnvidia-container/deps/usr/local/include/nvcgo/ctypes.h
make[2]: Leaving directory '/home/damon/Downloads/libnvidia-container/deps/src/nvcgo'
make[1]: Leaving directory '/home/damon/Downloads/libnvidia-container'
make -f /home/damon/Downloads/libnvidia-container/mk/elftoolchain.mk DESTDIR=/home/damon/Downloads/libnvidia-container/deps install
make[1]: Entering directory '/home/damon/Downloads/libnvidia-container'
MAKEFLAGS= bmake -j 24 -C /home/damon/Downloads/libnvidia-container/deps/src/elftoolchain-0.7.1/common
MAKEFLAGS= bmake -j 24 -C /home/damon/Downloads/libnvidia-container/deps/src/elftoolchain-0.7.1/libelf
--- libelf.so.1 ---
building shared elf library (version 1)
gcc -o libelf.so.1 -shared -Wl,"-soname libelf.so.1" -Wl,--whole-archive libelf_pic.a -Wl,--no-whole-archive  
/usr/lib64/gcc/x86_64-suse-linux/12/../../../../x86_64-suse-linux/bin/ld: Error: unable to disambiguate: -soname libelf.so.1 (did you mean --soname libelf.so.1 ?)
collect2: error: ld returned 1 exit status
*** [libelf.so.1] Error code 1

bmake[2]: stopped in /home/damon/Downloads/libnvidia-container/deps/src/elftoolchain-0.7.1/libelf
1 error

bmake[2]: stopped in /home/damon/Downloads/libnvidia-container/deps/src/elftoolchain-0.7.1/libelf
make[1]: *** [/home/damon/Downloads/libnvidia-container/mk/elftoolchain.mk:44: /home/damon/Downloads/libnvidia-container/deps/src/elftoolchain-0.7.1/.build_stamp] Error 2
make[1]: Leaving directory '/home/damon/Downloads/libnvidia-container'
make: *** [Makefile:263: deps] Error 2
damon@rapthalia:~/Downloads/libnvidia-container> 

I tried to replace -soname with --soname and that didn’t help. Libelf itself does not even have soname anywhere in build files.

Adding nVidia Container Runtime To Docker Manually

I was able to download libnvidia-* packages manually via https://pkgs.org/ and then I followed instructions from runtime Readme https://github.com/NVIDIA/nvidia-container-runtime. Both systemd override and daemon config does not work. Systemd override reports an error and daemon config is ignored.

That’s why nvidia-docker2 package is important, it installs the runtime integration which works.

And that’s all.

Add Comment

Comments (1)

dan
Posted: 2023-08-30 21:07:10 UTC / Approved: 2024-07-30 22:31:37 UTC
This appears to no longer work. First, the .repo file moved to https://nvidia.github.io/libnvidia-container/opensuse-leap15.5/libnvidia-container.repo. But when you zypper refresh after adding the repo, you get the error: Repository 'libnvidia-container' is invalid. [libnvidia-container|https://nvidia.github.io/libnvidia-container/opensuse-leap15.5/libnvidia-container.repo] Valid metadata not found at specified URL History: - [libnvidia-container|https://nvidia.github.io/libnvidia-container/opensuse-leap15.5/libnvidia-container.repo] Repository type can't be determined. Please check if the URIs defined for this repository are pointing to a valid repository. Skipping repository 'libnvidia-container' because of the above error.