While working on OSX I've got used to having the shark system profiler at my fingertips. I love being able to see what's going on in a live process, where all the threads are stuck. and what's taking up all the time on my system.
On linux you can use the oprofile kernel module, or the commercial zoom profiler (which uses a modifier oprofile under the hood I believe)
However if these aren't available to you then you can attach to your process using gdb and manually CTRL-C and backtrace / continue to get a feel for what's going on. This is suggested in several posts on stackoverflow( here and here )
A neater way to do this without ever pausing the application is..
gdb -batch -x stackdumper.gdb ./a.out 123456 > stack.0
where ./a.out is the binary you are interested and 123456 is the PID.
If you set stackdumper.gdb to contain
thread apply all backtrace
Then you'll get a backtrace on all threads. The advantage of this over the manual method is that the binary is stopped for as little time as possible.
I used this to find that all our threads were waiting on some JSON writing code that should have been fast.
i.e. a sample of about 10 runs of the sampler showed one thread deep in json decoding and 2-7 other threads all waiting in pthread mutex / condition code.