Monday, February 14, 2011

Poor-man sampler saves the day again.

So one of my applications was being very slow to start up when running in a VM.
Running the poormans sampler detailed in my previous post. I found that I got a heap of samples looking like this:

Thread 1 (Thread 0xb6f436f0 (LWP 1657)):
#0  0xb7828430 in __kernel_vsyscall ()
#1  0xb7370f93 in read () from /lib/tls/i686/cmov/
#2  0xb731aedb in _IO_file_underflow () from /lib/tls/i686/cmov/
#3  0xb731dcc8 in __underflow () from /lib/tls/i686/cmov/
#4  0xb731a888 in ?? () from /lib/tls/i686/cmov/
#5  0xb731c7b8 in _IO_sgetn () from /lib/tls/i686/cmov/
#6  0xb73103be in fread () from /lib/tls/i686/cmov/
#7  0x0811028f in main (argc=2, argv=0xbffccf54) at src/server.cpp:252

That is I was waiting for a read to complete. Sometimes for 5-10 seconds, sometimes for a few minutes.
What was this troublesome read? An unexpected socket call? Nope - something I would never have expected. Here's the "offending" code.

FILE * f = fopen("/dev/random", "r");
unsigned seed;
fread( &seed, sizeof(seed), 1, f);
This code seeds the random number generators using some values pulled from the random device /dev/random.

Reading from /dev/random blocks until there is enough entropy in its internal entropy pool to complete the read. On an isolated VM there's not much system noise generating entropy and so the pool was emptying quickly.

Turns out theres a non-blocking random device that uses feedback of hashed values to prevent blocking when the entropy pool is low, so switching this to using "/dev/urandom" and all was OK.

Now maybe I could have found this by doing some traditional debugging, (maybe not as that may have generated enough noise to stop /dev/random blocking!) But using dumps from gdb the problem was found in a few minutes of time.

Sunday, February 6, 2011

Poor mans sampling profiler for live processes using GDB

While working on OSX I've got used to having the shark system profiler at my fingertips. I love being able to see what's going on in a live process, where all the threads are stuck. and what's taking up all the time on my system.

On linux you can use the oprofile kernel module, or the commercial zoom profiler (which uses a modifier oprofile under the hood I believe)

However if these aren't available to you then you can attach to your process using gdb and manually CTRL-C and backtrace / continue to get a feel for what's going on. This is suggested in several posts on stackoverflow( here and here )

A neater way to do this without ever pausing the application is..

gdb -batch -x stackdumper.gdb ./a.out 123456 > stack.0

where ./a.out is the binary you are interested and 123456 is the PID.

If you set stackdumper.gdb to contain

thread apply all backtrace

Then you'll get a backtrace on all threads. The advantage of this over the manual method is that the binary is stopped for as little time as possible.

I used this to find that all our threads were waiting on some JSON writing code that should have been fast.
i.e. a sample of about 10 runs of the sampler showed one thread deep in json decoding and 2-7 other threads all waiting in pthread mutex / condition code.