Wednesday, January 04, 2012

Calling DBUS Experts....

We're expanding our GNOME desktop project and dbus on OpenSuse 11.4 is kicking our butts.  If anyone with knowledge of this code has any tips, they are gladly accepted as a comment.

What's happening is that right around 96 to 99 users the system dbus seems to no longer accept GDM connections and more users cannot log in.  Very often during this threshold we also see dbus chewing lots of CPU as it seems to be receiving retries over and over again which all fail.  /var/log/messages show gdm crashing and to my eyes it's happening when it's trying to talk to dbus.  I have installed the debug symbols and should get better backtraces starting tomorrow.  The shot below shows the crash 

The documentation is kind of lacking in regards to the tunable parameters.  One doesn't always know the default settings so as to know what should be increased.  It's also not very clear which resource is failing.  I set the following resources this morning and the issue happened again:


The technique that I used was to look at the source code and find where the defaults are set and then try and double them.  There are more parameters, any ideas which ones might help?  It would be wonderful if dbus-monitor showed you these types of failures, but it seems to only show you bus activity which really doesn't help.  As soon as we drop below the 96 users, everything works correctly and users can log on and off with no problems.   This leads me to strongly believe this is a parameter that is being reached. 

Any tips?  Drop a comment.  Thanks!

6 comments:

Havoc said...

The available limits are all listed in "man dbus-daemon" (unless someone added one without adding it there, but probably they are all there). session.conf sets most of them explicitly; but your problem is probably with the system bus (system.conf) which uses the hardcoded defaults in dbus/bus/config-parser.c. If you copy the very high limits from session.conf you would surely solve the problem, though you might create a theoretical local DOS security problem, that may not be a concern in your environment.

Dave Richards said...

@Robert;
Perhaps our man page is old, but the limits are not listed in there. So I looked over the source again and found the system.conf that is in the build system but on OpenSuse was not carried into the dbus config directory. I brought over all the settings and we will reboot and test tomorrow. It would be really helpful if dbus had some spewage in dbus-monitor that indicated thresholds that are being hit. Fingers crossed for tomorrow.

Anonymous said...

We designed DBUS for a one user system not for such a big deployment, you are on your own buddy, or you can pay me to do the improvement..

Gábor said...

@Anonymous this is a main problem. If anybody builds a Linux based terminal server, this conception is dies. I think you must review your point of view.

Anonymous said...

Anonymous, thats awesome. I will add into my fortune quotes.

Dave Richards said...

@All:
Good new, it's a known issue and Vincent the rescue.

https://bugzilla.novell.com/show_bug.cgi?id=739743

What happens is that dbus on OpenSuse 11.4 runs out of FDs at 1024 which is about 96 users. Will test a new RPM when it's created.