My Opinion on Glibc 2.x
TLDR: Glibc, get your shit together! You’re becoming obsolete (to musl I guess), you are hard to work with. Stable threading is a must these days.
For a last couple of months I really tried to finish libnss-maria, but it’s been painful. I can handle almost non-existing documentation except for man pages and looking into source code.
What I don’t like is impossible challenges which I just cannot fix. My library should provide access to naming service to get data from database. But in short - there’s an enumeration function to get all users from the database.
One by one. But for that I need to share a state somewhere, right? Yes, a variable. But the documentation. When glibc is compiled without pthreads, does it wait for getpwent to finish all requests or not? No idea.
I wanted to use pthreads, but I heard it cause loads of race conditions if glibc is not compiled with pthreads and only module is. Later I found the evidence. Fuck!
There are times when I hate to be a software developer and why am I afflicting myself pain on myself in my free time anyways? Maybe I could ask on stackoverflow.com or in a mailing list.
Hmmm, actually it is a good idea. I’m just afraid I won’t get an answer.
Yes, uncertainty. An old libnss-mysql does some locking, but hell, it doesn’t work and race conditions cause havoc on my server. Apache gets locked a lot and need to be restarted every couple of days.
The source code of glibc is very complex. Lots of macros are used and I believe now to those who say glibc is too complex and hard to work with. It is.
glibc thread races Posted Aug 17, 2015 19:57 UTC (Mon) by wahern (subscriber, #37304) Parent article: Glibc 2.22 released
One of the biggest ongoing problems with glibc is its threading support. If you dlopen a library that links in pthread for the first time, there are races all over the place. glibc tries to make it work, but there are a ridiculous number of open bugs about race conditions. I just opened one this year: dlerror (and by association dlopen and dlclose) aren’t thread-safe in this scenario. Most of the NSS code is broken for similar reasons.
They either need to fix it or simply stop trying to make it work. Instead, these very serious threading bugs are left to languish because the fixes require a significant overhaul or a significant change to the documented semantics. Everybody will complain loudly if they official stop supporting this. But they don’t have the time to make it work properly. So instead you end up with applications that are fundamentally broken, with most developers and users ignorant of how broken their applications are.
OS X, Solaris, musl-libc, and others solved the issue by simply incorporating libpthread into libc. On Linux, interpreters like Perl and Python are usually linked with libpthread unconditionally, with most people none the wiser, despite the fact that doubtless most people, were they given a veto, would have violently opposed such a move out of exaggerated concern for single-threaded performance. I ran into the above bug because no distribution links its Lua interpreter with libpthread, which means loading a Lua module that uses threading is broken, although it will appear to work in many cases (especially if you don’t unload modules before exiting.) I ignored the Valgrind bug reports as false positives until I was forced to track down persistent segfaults on a production server. The bug report is still marked as NEW sigh.
Given the pervasive use of multi-threading, and the sheer difficulty of trying to optimize for the non-threaded case without introducing a metric ton of difficult to detect races in threaded code, glibc should just assume it’s always running threaded and optimize accordingly to minimize the cost for single-threaded apps. Most of the current code which attempts to detect threading is manifestly broken, and has been for years.
They don’t have to literally merge libpthread, just unconditionally make use of the mutexes it already has in place.
I really hoped glibc would be threaded on my Debian or Ubuntu and I could just ignore other platforms….but it isn’t. Get back to uncertainty about waiting for getpwent to end all queries.
I need a break from this hell.