My desktop decided to spontaneously restart again today in a repeat of what happened a few weeks ago.
The short-term memory replacement (Twitter) tracked the occasion:
http://twitter.com/rodneygedda/status/8616389858
No warning, nothing. Badda bing, badda bang… blank screen, some hard disk seeking then KDM restarts. Good thing I save my work regularly and both Konqueror and Firefox have session restoration!
I haven’t had a chance to do any debugging yet, but I invariably will as this type of failure is unacceptable if Linux is going to have a snowflake’s chance in hell of providing a credible alternative on the desktop.
Mind you, my notebook does get hammered every day in the form of multiple suspend to rams, compositing (Kwin, not Compiz) and scores of applications running at the same time.
So when I say my desktop crashed I’m really saying “my desktop has been going for two weeks straight (no restarts, just suspensions) and decided to have a hiccup”. No idea what my record it but could well be over a month before a kernel update demanded a reboot.
Any debugging tips will be greatly appreciated (Kubuntu Karmic).
Without pointing the finger at X.Org, does it add complexity the modern Linux desktop could do without? Not that it’s unstable at all (in fact, X has always been a model of stability and robustness when configured correctly), but does it let Linux distributions down in an area where Windows and Mac OS X shine?
So far the logs haven’t told me much. Let me wait until it really become s problem before I start digging furiously.
I guess one place to look would be at the end of /var/log/Xorg.0.log.old, which is the log of the just exited X server (assuming it did exit).
You might be curious about my effort to get a simple NULL pointer check happening in X11 through the Ubuntu launchpad bug tracker.
https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/408016
That’s after I provided both a program to demonstrate the problem, and I also provided a suggested patch to check for a NULL pointer (a very simple patch that anyone with a tiny bit of programming skill could verify as correct).
What I’ve found is that it’s bloody difficult to get people to take crash bugs seriously, mind you I’m not a paying customer so I guess maybe I should get a subscription… however I’ve moved back to CentOS for the time being — less features but more stability. That is one little problem with the paid support business model. If you provide good support for free, no one needs paid support, but if your free support is shit then it doesn’t impress anyone to pay you either.
Check out: xserver-xorg-video-intel bug list and see the words: “freeze”, “crash”, “lockup”, “hung”, “segfault”, etc. Most of the bugs are older than three months, some as old as six months.
If you can reliably make the crash happen then following these steps will get you into the debugger where you can get a backtrace and/or have a look at it yourself.
https://wiki.ubuntu.com/X/Backtracing
If you cannot reliably make the thing crash, you might find that the apport package can save the core dump at the time of the crash and generate a bug report… I’ve not tried apport but maybe it’s a way to get your bugs looked at faster *shrug*