Exiting GPU process because some drivers can’t recover from errors
Recently I updated nVidia linux drivers to:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15 Driver Version: 550.54.15 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4080 On | 00000000:09:00.0 On | N/A |
| 0% 41C P8 7W / 320W | 1550MiB / 16376MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
and I got these errors all the time when running Chromium:
máj 08 03:02:12 rapthalia kwin_x11[5007]: qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 62375, resource id: 79691784, major code: 18 (ChangeProperty), minor code: 0
máj 08 03:04:38 rapthalia krunner[9371]: [9371:9371:0508/030438.745549:ERROR:shared_context_state.cc(1079)] SharedContextState context lost via ARB/EXT_robustness. Reset status = GL_GUILTY_CONTEXT_RESET_KHR
máj 08 03:04:38 rapthalia krunner[9371]: [9371:9371:0508/030438.745705:ERROR:gpu_service_impl.cc(1124)] Exiting GPU process because some drivers can't recover from errors. GPU process will restart shortly.
máj 08 03:04:38 rapthalia krunner[6850]: [6850:6850:0508/030438.765977:ERROR:command_buffer_proxy_impl.cc(323)] GPU state invalid after WaitForGetOffsetInRange.
máj 08 03:04:38 rapthalia krunner[6850]: [6850:6850:0508/030438.800885:ERROR:gpu_process_host.cc(997)] GPU process exited unexpectedly: exit_code=8704
DMESG - NVRM: krcWatchdogCallbackVblankRecovery_IMPL Error
The output from dmesg
:
[13517.427715] NVRM: GPU at PCI:0000:09:00: GPU-39d6bee3-b86c-946b-b921-9d8ca886556b
[13517.427724] NVRM: Xid (PCI:0000:09:00): 16, pid='<unknown>', name=<unknown>, Head 00000003 Count 000bd943
[13517.427731] NVRM: krcWatchdogCallbackVblankRecovery_IMPL: NVRM-RC: RM has detected that 7 Seconds without a Vblank Counter Update on head:D0
[13525.619416] NVRM: Xid (PCI:0000:09:00): 16, pid='<unknown>', name=<unknown>, Head 00000003 Count 000bd944
[13525.619432] NVRM: krcWatchdogCallbackVblankRecovery_IMPL: NVRM-RC: RM has detected that 7 Seconds without a Vblank Counter Update on head:D0
[13533.811249] NVRM: Xid (PCI:0000:09:00): 16, pid='<unknown>', name=<unknown>, Head 00000003 Count 000bd945
[13533.811268] NVRM: krcWatchdogCallbackVblankRecovery_IMPL: NVRM-RC: RM has detected that 7 Seconds without a Vblank Counter Update on head:D0
[13542.002862] NVRM: Xid (PCI:0000:09:00): 16, pid='<unknown>', name=<unknown>, Head 00000003 Count 000bd946
[13542.002879] NVRM: krcWatchdogCallbackVblankRecovery_IMPL: NVRM-RC: RM has detected that 7 Seconds without a Vblank Counter Update on head:D0
[13550.194621] NVRM: Xid (PCI:0000:09:00): 16, pid='<unknown>', name=<unknown>, Head 00000003 Count 000bd947
It is supposed to be fixed in 555.42.02 - https://github.com/NVIDIA/open-gpu-kernel-modules/issues/632, yet I still see it on newer 550.54.15. But the issue is still open, so there still might be something going on.
Why now? How? Why? I don’t know. Yet it freezes Chromium window or a whole desktop for like 30 seconds to 1 minute. I thought nVidia drivers are, while very proprietary, rock solid, but it seems I’m fucked. Others wrote that this error is not caused by Chromium itself, but it is a side effect to a condition with drivers. Why can’t we just have stable graphics drivers when I want to use CUDA or OpenCL? Man, it was a nightmare in 2005, in 2010 it got much better, but still, here we are, 15 years later. If it was open source other could at least take a look and fix stuff.
The best bet is to reinstall the operating system.
Add Comment