The Robots Brain: 2011

Tuesday, July 5, 2011

Examining multiple cores with gdb and grep

So you've got a bunch of core files from a failing application. You look in one of them and find what's causing that particular error. However you're not sure whether all the cores are caused by the one same issue.

The application I was having trouble with was a multithreaded app, so I couldn't rely on the stacs being identical every time... so this is what I did.

stackdumper.gdb Then in bash:

In my case I found that string in all 80 cores, so I know that was the only issue.

Thursday, June 2, 2011

Javascript databindings with observers

This is not quite as clear as the previous post, but heres a version of my javascrip prototyping with observers on the data. Again you can play with it using jsfiddle .. or heres the code.

Creating data bindings using javascript closures

I've been working on javascript codebase that stores all its data in forms, hiding and showing all the form elements as the view changes. This has worked well while the structure of the data was relatively rigid. But now we've got dynamically structured (tree-like) data we need to be able to grow and shrink the views in more flexible ways.

While I'm not a gung-ho MVC advocate, it certainly looked like the app could do with a bit of an MVC style cleanup (the data was in the view rather than being accessed by the view).

The problem was it wasn't clear how to get the data out of the view... here's a simplified version of the method I'm planning to use.

You can also play with this using jsfiddle

I'm thinking about implementing broadcasting of changes and more model based approach later... but maybe that'll have to be another post.

Wednesday, May 18, 2011

Gits prepare-commit-message hook

So I often write bad commit messages. At best they're inconsistent - sometimes with ticket ids at the start sometimes with ticket ids at the end. Sometimes only a short message, sometimes a nicely formatted bullet point list.

Thankfully git has a tool to help you get these things consistent. Its called the "prepare-commit-msg hook"

I wrote a little python script to make a default commit message. Its not perfect but it should help... It takes the branch name, searches for a version tag and removes it, then searches for anything that might be a ticket id and adds it to the message. Finally it adds some boilerplate.

Command line git trees

Often I'm stuck in a terminal and want to see the commit history in a nice tree format. There's a nice solution ... git log recently learnt the "--graph" feature.

You can get pretty nice results from "--graph --oneline"

But using the tricks from http://www.jukie.net/bart/blog/pimping-out-git-log we can get a much nicer result - including author and branches and time all coloured nicely

Friday, April 29, 2011

Rendering maths from a web page

Occasionally I do pretty math intensive coding.
And one of the best pieces of software for writing about maths is TeX.
It was written by one of the gurus/fathers of modern computer science in 1978, and is still in use all over the world. So when it comes to putting math on the web, it used to be a workflow like this

render math using tex
copy and paste somewhere
add an image link to your docs.

However some clever people have made that process so much easier: Two efforts in particular look very nice

Heres some mathjax : $ \nabla \times \vec{\rho} = 0 $
Heres the same thing using the google api: $\nabla \times \vec{\rho} = 0$

I think the mathjax is nicer, but the google charts version is easier to integrate.

Friday, April 1, 2011

Using saru with googlemock and googletest

Saru (earlier blog post) is my little testing framework. It does everything we need. It's been used in several serious software development situations. But sometimes things are a little painful.

For example, saru doesn't come with any nice mocking helpers for C++. There is a basic C++ testing library that comes with it, but writing your own mocks by hand is one of those painful things I mentioned.

Thankfully there's a nice mocking library for C++ from the folks at google, called oddly enough, googlemock. However googlemock is designed to work with googletest - the google testing framework.

Googletest is also great. But its orthogonal to saru, rather than competitive. Saru is cross-language and designed to be more of a test-running wrapper, while googletest is a c++ unit testing library.

So I had three options if I wanted to use google-mock with my code and saru.

Make the google-mocks work with the saru-cxx library.
Make google-test output in a format that saru could digest.
Make saru able to parse google-test output.

IMO the third is the wisest and most extensible option. Luckily the changes were pretty easy.

So now getting a google-test file working in saru is as trivial as adding a

\\SARU : Format gtest

to the top of the test file.. and everything just works :)
(Well you'll need to make sure the compiler can find the right includes and the gtest library.. but thats all)

For example I get this kind of output when running a test suite.

99-misc-00-periodic-processor.cpp::TestFixture::test_process_many : OK
gmock_test.cpp::PartyTest.CallsDance : OK
gmock_test.cpp::PartyTest.CallsDanceFails : FAILED
==MESSAGE==

==STDERR==
gmock_test.cpp:69: Failure
Value of: p.party()
  Actual: false
Expected: true

Wednesday, March 2, 2011

Introducing the fastjson library

So using my poor mans sampler (and the awesome shark tool on OS X) we found that my application spends a lot (90%) of its time serializing and deserializing json. That's "not good".

I've had a lot of pain finding a good json library for C or C++. Some are one-way (encode or decode only). Some wont handle large numbers (uint64_t at least). Some can't handle unicode.

The boost version based on boot::property tree looked very promising at first. But it has a lot of nasty edge cases. (Unicode is broken unless you're using wchar_t, value nodes lose their type and become strings, certain edge cases wont serialize properly). We managed to work around all these issues, patching the boost libraries and putting hacks into our code. But the insurmountable problem is that it is SLOW.

Now I could look for another library and work around its idiosyncrasies, but boost is already the second library that we've tried to work around.. and none of the other libraries look promising (or have licenses we can work with...)

What can you do when one of your core libraries is causing most of your pain. (Bugs, performance, hacks) - rewrite it ;)

This is not a path we took lightly. But we have an alpha/beta working version of the library, and its about to go into our live servers. The code is a bit disorganized and could do with a bit of a clean up.. but our goals have been reached. It seems to parse and write json about 20x faster than our corresponding boost wrapped code. It performs far less allocations. It doesn't throw away type information. It does support arbitrary sized numbers. It does the right thing with unicode (except one edge case of properly converting some UTF8 into surrogate escaped UTF16 pairs).

Anyway I'll do a proper post on how to use it soon, but for now here's the github link:

https://github.com/mikeando/fastjson

I'd love people to play with it and leave feedback.

There's also a post about it on my company blog.

Monday, February 14, 2011

Poor-man sampler saves the day again.

So one of my applications was being very slow to start up when running in a VM.
Running the poormans sampler detailed in my previous post. I found that I got a heap of samples looking like this:

Thread 1 (Thread 0xb6f436f0 (LWP 1657)):
#0 0xb7828430 in __kernel_vsyscall ()
#1 0xb7370f93 in read () from /lib/tls/i686/cmov/libc.so.6
#2 0xb731aedb in _IO_file_underflow () from /lib/tls/i686/cmov/libc.so.6
#3 0xb731dcc8 in __underflow () from /lib/tls/i686/cmov/libc.so.6
#4 0xb731a888 in ?? () from /lib/tls/i686/cmov/libc.so.6
#5 0xb731c7b8 in _IO_sgetn () from /lib/tls/i686/cmov/libc.so.6
#6 0xb73103be in fread () from /lib/tls/i686/cmov/libc.so.6
#7 0x0811028f in main (argc=2, argv=0xbffccf54) at src/server.cpp:252

That is I was waiting for a read to complete. Sometimes for 5-10 seconds, sometimes for a few minutes.
What was this troublesome read? An unexpected socket call? Nope - something I would never have expected. Here's the "offending" code.

FILE * f = fopen("/dev/random", "r");
unsigned seed;
fread( &seed, sizeof(seed), 1, f);
fclose(f);
srandom(seed);

This code seeds the random number generators using some values pulled from the random device /dev/random.

Reading from /dev/random blocks until there is enough entropy in its internal entropy pool to complete the read. On an isolated VM there's not much system noise generating entropy and so the pool was emptying quickly.

Turns out theres a non-blocking random device that uses feedback of hashed values to prevent blocking when the entropy pool is low, so switching this to using "/dev/urandom" and all was OK.

Now maybe I could have found this by doing some traditional debugging, (maybe not as that may have generated enough noise to stop /dev/random blocking!) But using dumps from gdb the problem was found in a few minutes of time.

Sunday, February 6, 2011

Poor mans sampling profiler for live processes using GDB

While working on OSX I've got used to having the shark system profiler at my fingertips. I love being able to see what's going on in a live process, where all the threads are stuck. and what's taking up all the time on my system.

On linux you can use the oprofile kernel module, or the commercial zoom profiler (which uses a modifier oprofile under the hood I believe)

However if these aren't available to you then you can attach to your process using gdb and manually CTRL-C and backtrace / continue to get a feel for what's going on. This is suggested in several posts on stackoverflow( here and here )

A neater way to do this without ever pausing the application is..

gdb -batch -x stackdumper.gdb ./a.out 123456 > stack.0

where ./a.out is the binary you are interested and 123456 is the PID.

If you set stackdumper.gdb to contain

thread apply all backtrace

Then you'll get a backtrace on all threads. The advantage of this over the manual method is that the binary is stopped for as little time as possible.

I used this to find that all our threads were waiting on some JSON writing code that should have been fast.
i.e. a sample of about 10 runs of the sampler showed one thread deep in json decoding and 2-7 other threads all waiting in pthread mutex / condition code.

Thursday, January 27, 2011

Lean Development : Focus on Learning

So continuing on with my notes on Lean Development, we're going to look at point 3 from Tom Poppendiecks business card.

3. Focus on Learning - Scientific Method; Challenge Standards; Feedback; Continuous Improvement

Lets start from the back of the list and work to the front, as I find that to be the logical progression of the sub-points.

Continuous Improvement is about continually tweaking your production pipeline and/or the product so that you become more efficient in producing what the customer wants.

Feedback is how you evaluate the changes you make to your pipeline. If your metrics improve, then accept the change, if not work out why it didn't and undo the change.

Challenge Standards - This is a cultural attitude of not accepting the status quo. Its the need for the people involved to have the desire and ability to change the system. If something is a standard then there needs to be a reason for why. If not its arbitrary and subject to getting changed (if the change can be demonstrated to be beneficial)

Scientific Method - This is how we go about getting change implemented. Firstly you hypothesise measurable results from the change will do, then you make the change. If what you hypothesised didn't happen then you need to undo it and reevaluate. "Switching to a frooble compiler should reduce test execution times by 50%, increasing throughput by 3 story-points per week". Results and experiments should be documented somewhere so that the company can learn from what was done (positive or negative).

If we do these things we will understand how our process works and will have a system so that the process can change to meet the changing environment in which the development pipeline exists.

Monday, January 24, 2011

Lean Development : Build Quality In

This is the fourth post in my mini-series about Lean Development.

The bullet point I'm going to be writing about today is:

Mistake-Proof with TDD; Write No New Legacy; Continuous Integration

TDD is "Test Driven Design". This means not only testing all your code, but writing your tests before you code. And letting your design be driven by the issues that arise while making these tests pass. This also holds for bug fixing -- create a test case that reproduces and narrows down the bug, then make that test pass.

"Write No New Legacy" means don't write code that is hard to maintain. This means modular and easily testable code. Make sure the hard bits of the code are documented.

Continuous Integration means that your code should be being built and tested on every check in. Broken builds should be addressed immediately. Implicit in this is some kind of version control system. Often this is a larger batch of tests than the specific unit / functional tests used while designing a single feature. Ideally this will run on multiple test systems, for each system you deploy to.

One idea behind these points is to make it hard for an unnoticed error to reach production. TDD makes it difficult to make the error in the first place. "Write No New Legacy" means you shouldn't have tricky untestable code paths to trip you up. "Continuous Integration" means you should never have a "broken system" on your hands.

The bigger idea behind these points is continuous improvement of the code-base. If all your tests from TDD get plugged into the continuous integration tests, then you should never have a bug reappear.This should give you confidence that your code is doing what it is supposed to do.

The final point is that these processes make it fast to track down a bug, less time hunting bugs means more time delivering real product.

Why use saru for testing?

I've had a few questions about why I use my own little testing framework, saru, for testing, rather than using something standard like cxxunit or whatever.

One of the key problems is testing across multiple languages. We use C++, PHP and python for various pieces of our pipeline. Saru was designed so that plugging in a new language is easy. So if someday we need to support Java then I'm not worried. Now we could use cxxunit for C++, phpunit for php, etc, but I like to have unified reporting and a bit more integration of these tests.

Another problem I have with using a pure C++ framework is sometimes I want to test the condition "Class Foo should not be default constructible." The easiest way for me to check that is to have a fragment of code that should fail to compile with given error messages. You can't test that with a C++ testing framework. (In this particular case you may be able to do something with SFINAE style template hackery, or fork and call a compiler, but these just feel very hackish)

My third reason for using saru is that sometimes I want to do things that are not easily expressed in C++, but easily expressed in other languages. Things like "are the source files correctly formatted?", or "do all C++ exceptions have a corresponding PHP handler?", or "are there tests for each of the classes in this directory?". These are all C++ related questions, but much easier expressed in bash using awk/sed/grep or python than in pure C++.

So thats why I use saru.

Tuesday, January 18, 2011

Lean Development : Eliminate Waste II

This is going to be another short post. Luckily we only have three points to cover...

The remaining 3 points from my earlier post on the first principle of lean software development are:

Over Processing
Over Production
Defects and Rework

Over processing is doing more work on a product than is required. For software this may mean performance tuning code that isn't performance critical, handling edge cases that can never occur, making code overly flexible. All these kinds of things waste time. Now the tricky part is working out what the "over" part is. Thankfully this issue is addressed in an agile style development system where you get the roughest possible solution out to users and tweak it with their feedback. If the users aren't complaining then you don't need to fix it.

Over production in a production environment, means producing more stock than the consumer will consume or at a rate that leads to a build up of inventory. In a software situation I consider the features to be analogous to the product. If you are producing features that are not needed by the customer then you are over producing. In a software sense this is very similar to the over-processing.

Defects and Rework, in my experience, are where a large chunk of waste in a traditional software development lie. I seem to spend a lot of time fixing bugs. So what do we do about it? Test Driven Design, Continuous Integration and "Stop the Line" are some of the tools advocated by Lean software. I guess I'll have to write about some of them in another post.

As I said this was going to be a short one. I'll move on to the next of the principles tomorrow...

Monday, January 17, 2011

Lean Development : Eliminate Waste

This post has taken longer than I'd hoped, and is consequently less polished than I'd hoped... but here goes...

The first entry in our list of Lean Development concepts shows leans focus on efficiency:

Eliminate Waste - No Extra Features, Churn or Boundaries

Before we can eliminate waste we must first understand what waste is. I think of this concept as "Don't do what you don't need to do" and waste is any resource use that is not driving your core business.

Here is a list of 7 deadly wastes - which I believe come from the original Toyota lean methodology.

Transportation
Inventory
Motion
Waiting
Over-Processing
Over-Production
Defects and Rework

In most of these cases the resource that is being wasted is time.

I'm going to cover the first 3 today, and more tomorrow.

Transporting a commodity from A to B takes time and may have a monetary cost. But in a digital world this is less applicable. However if you think of this as meaning inefficiencies in your supply chain this may make more sense. How long does it take you to get a "ready" version of the code out to your customers? How painful is this process? How robust? What are you doing in the process that can be automated or removed? Any manual interaction that is not needed is wasted effort. Any unneeded delays are wasted time.

Inventory in a software sense is completed features not shipped to the users. A feature is not providing value to a customer when it is sitting in your development version of the software. The sooner you can get a completed (and tested!) feature out to you users, the sooner it provides value to them, and thus dirves value for you. Of course if every push to the users is taking a couple of days of effort then you have a transport issue that needs fixing first.

Motion, to me means one developer (or task) doing unnecessary actions as part of its progress through the conceptual pipe from idea to implementation. These often crop up as bureaucracy - paper work that will be discarded, double entry into multiple bug tracking systems, emails to supervisors, stuff that makes a developer busy but not productive. This is wasted effort and time.

Waiting.. this usually means waiting for feedback from another party, or waiting for compilation, or waiting for tests to run. The developer ends up doing nothing productive, or is less-productive due to context switching between tasks. So parallelize or speed up the process. Tests should be near instantaneous - your full test suite takes an hour to run - split it up across 10 machines and it should take 6 minutes. Compilation should be fast (and incremental) - use ccache, distcc or something like that. Feedback should be fast or at least predictable.

Well thats it for today, as I said its a bit choppy butt hopefully you got something out of it.

Sunday, January 16, 2011

Lean Development

I've been looking for something I could write a series of posts about .. and I stumbled across Tom Poppendieck's business card. An odd thing to blog about you might think, but on the back of the card are 7 core ideas of Lean software development. I'm just going to repeat them here, then elaborate on what they mean to me. Lean is a style of development that I've been trying to move towards it focuses heavily on efficiency, blending ideas from Agile Development with ideas learn from large scale manufacturing from Japan. An odd combination, but it feels right to me.

So here's the list verbatim from Tom's card:

1. Eliminate Waste - No Extra Features, Churn or Boundaries
2. Build Quality In - Mistake-Proof with TDD; Write No New Legacy; Continuous Integration
3. Focus on Learning - Scientific Method; Challenge Standards; Feedback; Continuous Improvement
4. Defer Commitment - Break Dependencies; Maintain Options; Irreversible Decisions at Last Responsible Moment
5. Deliver Fast - Low Cost and Quality and Speed; Queuing Theory: Flow, Limit Work to Capacity
6. Respect People - Pride, Commitment, Trust and Applause; Effective Leadership; Respect Partners
7. Optimize the While - Measure Up, Avoid Sub-Optimization; While Value Stream & Whole Product

Oh and just a note, any mistakes here are purely my own, as are any opinions presented. But credit for the ideas really belongs elsewhere.

The Robots Brain