The Robots Brain

Finding changes between branches with git.

2018-06-22T14:34:00.002+09:30

I'm often looking for a way to find the particular changes between two branches in git. I know I can use
git log --pickaxe.
But it can be tricky to get concise output from this.

I've added an alias to my
.gitconfig
file to get me the info I want at a glance.

[alias]
    findchanges = "!f() { revision=$1; shift ;                           ⏎
… for x in $(git diff $revision --name-status | cut -f 2) ; do           ⏎ 
…    git diff $revision -U0 -- $x | grep '^[+-] ' | sed 's#^#'$x': #' ;  ⏎
… done | grep \"$@\" ;                                                   ⏎
… } ; f"

(Note: This should all be one line - I've had to break it for readability)

This lets me do the following

> git findchanges my_branch -i todo
fileA.c: +    // TODO: We really should fix this
fileB.c: -    // TODO: Make sure we tweak the frobnitz later

Profiling IO in java on linux

2014-05-06T13:43:00.000+09:30

This is more of a note to myself, but others may find it useful.

When you want to see what bottlenecks are being hit by some java process that you're running you can do the following:

Repeatedly obtain the backtrace using jstack PID. If you're repeatedly seeing something blocking in a read/write related call then you're IO bound
Check what files are open by the process using lsof PID. Its likely that one of these will look suspect
Watch reads to those open files using strace -f -p PID -e trace=none -e write=FD1,FD2,... -e read=FD1,FD2,...

Cleaning up whitespace additions in git

2014-03-07T13:13:00.003+10:30

I wrote a previous post on cleaning up end of line whitespace using git... But I've now run into a similar issue, the IDE used by several of the devlopers on my project
likes to change surrounding whitespace indentation of lines. This means if you change just one line, the indentation changes will swamp the real changes, making the diff unhappy to look at.

You can view a diff without showing whitespace changes in git using --ignore-space-change or --ignore--all-space. However unforunately the command

git rebase -f master --ignore-space-change

doesn't do what one might hope.

However all is not lost. You can get the same kind of behaviour by plumbing together git format-patch and git am.

git branch fixed 40caad7
git checkout fixed
git format-patch --stdout --ignore-all-space fixed..original | git am --ignore-whitespace

This takes the the changes from 40caad7 to original and applies them to the new fixed branch, but removes/fixes whitespace changes.

however this is not nice to use - so you can wrap it up into a little git alias like this: (newlines inserted for clarity - you'll need to remove them if you use it)

[alias]
 cleanwhite = "!f() { 
⏎  orig=$(git rev-parse HEAD) ; 
⏎  mergebase=$(git merge-base HEAD $1) ; 
⏎  git reset --hard $1 ; 
⏎  git format-patch --stdout --ignore-all-space $mergebase..$orig | git am --ignore-whitespace ; 
⏎ } ; f"

Now cleaning up your current base is as simple as

$ git cleanwhite master

Which will clean up all commits whitespace from your current commit back to where you diverged from master.

Finding changes in ugly XML with git and xmllint

2012-10-20T13:22:00.003+10:30

This is probably not going to be useful to very many people, but it helped me track down a small bug, and I'm preserving it mostly incase I need something similar later.

So the key issue here is that we have some XML that is stored in git. Unfortunately this generated XML is not nicely formatted. Thus changes in git don't show nicely using git log or git diff.

My technique was this:

Find the commits that changed the file of interest.

git log --oneline afile.xml | awk '{print $1}'
Get the file at that revision.

git show $REVISION:afile.xml
Get the file at the previous revision

git show $REVISION~1:afile.xml
Pass these through xmllint --format to clean them up

diff the cleaned up versions

This sounds pretty complex, but it can be wrapped up into a concise piece of bash script.
Now its not going to work across merges etc, but the general technique can be handy.

I suspect a similar thing may have been obtainable by setting a custom diff tool in git but I couldn't see an easy way to get exactly what I wanted.

Partial template specialization for functions in C++

2012-10-03T18:34:00.000+09:30

Partial template specialization for functions in C++

The short of it is you can't do it. But you can do something that looks just like it.

What is template specialization?

For functions template specialization looks like this:

So the first template tells us what to do in general and the specialization tells us what to do in particular cases.

What is partial template specialization?

Its just like template specialization, but you're not specifying all the template parameters. So for the example above we might like to add a specialization that prints "Lucky 7" when the first template argument is 7. If we could write it, it would look like this (but its not valid C++)

However this doesn't work - it's not valid C++.

What is the issue?

C++ does not allow function partial specialization.

How do we get around it?

C++ does allow partial template specialization for classes (and structs). So our solution is just to defer to a templated helper class with static functions:

Its verbose, but it works.

NOTE: in this case we're working with `int` parameters that would be better done with a normal function containing a couple of `if` statements - however all this works for type templates too.

Faking watch on OS X

2012-04-19T13:12:00.000+09:30

Sometimes watching something change over time in your terminal is kinda helpful. Linux has a great utility for this called "watch". OS X doesn't some with this. Of course you can build it yourself if you want .. but that can mean a chunk of pain too... luckily bash comes with what you need. "while" and some ANSII escape sequences are enough to get you going.

Here was my first try:

The first problem with this for me was that in iterm2 clear just keeps adding to your terminal scrollback in some weird way. Luckily replacing this with some ANSII control sequences fixes that up nicely.

Now it doesn't add to the scrollback but it does blink every 4 second. I fixed this by storing the current and previous results in temp files and only clearing if they've changed.

Farewell Tim

2012-03-24T11:34:00.000+10:30

It has been just over two weeks since my little brother Tim was killed. He was hit by a drunk driver while cycling - training for the Ride For Youth, a charity ride raising money for young people at risk of self harm or suicide.

The ride for youth requires serious dedication from those participating, riding 640km over 5 days. They are expected to train with the team at least 3 days a week for 6 months. In the week before his death Tim rode over 400km. My Dad has been doing the ride for several years, and this year was going to be his last and Tim's first. Tim was athletic all his life, but with this training he was fitter than he'd ever been.

Tim had an amazing group of friends. He touched so many lives. We've had messages of sympathy and support from all over the world. We estimate somewhere between 700 and 1000 people attended his funeral. Through his studies and work as Chemical Engineer he lived all over the world: Perth, Brisbane, Dubai, Wales, Switzerland and more. Everywhere he went he made great lifelong friends. To Tim everyone was a potential friend.

Tim was also intelligent. He had left his work to pursue a Masters degree in Chemical Engineering, and was considering upgrading that to a Ph.D. Amazingly Tim was using many of the tools I'd studied in my Ph.D. and post-doc. In the last year we'd had conversations about such esoteric topics as "Convergence and Stability of Finite Element Methods."

Finally, Tim was an amazing uncle to my daughter Kira, and my son Grant. Even after a training ride of 140km Tim would still have the energy to run around the table with Kira playing chasey. He'd leave the social events early to take Kira to the beach. Kira is going to miss him terribly, and I'm so sad that Grant will never get to know him. I was so looking forward to Tim teaching them both things like surfing and basket ball. (Both of which I'm useless at.)

If you want to support Tim's cause, don't send us flowers, donate here:
http://www.rideforyouth.com.au/riders/enjo/tim-anderson

Some further information about Tim and the accident can be found here:
www.facebook.com/remembertimbo

I don't know what else to say except

Goodbye Tim,
We're really going to miss you.

Installing node.js on OS X 10.5

2012-03-02T18:20:00.000+10:30

The binary distributions of node.js no longer work on 10.5 (at least no distribution I could find.). So I went about building my own from source. There's several pitfalls I had to overcome, so I figured I'd list the solutions here.

Step 1. Download the source. I did this using:

git clone git://github.com/joyent/node.git
git checkout origin/v0.6.11-release

Step 2. Patch the included v8 source. If you dont patch it you get an error about missing symbols for Dictionary::SlowReverseLookup, although they may be mangled, so it's not to obvious.

diff --git a/deps/v8/src/objects.cc b/deps/v8/src/objects.cc
index 88ebbf4..c4aea1c 100644
--- a/deps/v8/src/objects.cc
+++ b/deps/v8/src/objects.cc
@@ -10012,6 +10012,9 @@ template Object* Dictionary::
 template Object* Dictionary::SlowReverseLookup(
     Object*);
 
+template Object* Dictionary::SlowReverseLookup(
+    Object*);
+
 template void Dictionary::CopyKeysTo(
     FixedArray*,
     PropertyAttributes,

Step 3. Download a new version of openssl and build and install shared versions. I downloaded version 0.9.8t and built it using:

./config shared
make
make install

On OS X by default this installs to /usr/local/ssl If you try to use the default version of openssl then you get a bunch of errors:

../src/node_crypto.cc: In member function ‘bool node::crypto::DiffieHellman::Init(int)’: 
../src/node_crypto.cc:3537: error: ‘DH_generate_parameters_ex’ was not declared in this scope 
../src/node_crypto.cc: In static member function ‘static v8::Handle node::crypto::DiffieHellman::ComputeSecret(const v8::Arguments&)’: 
../src/node_crypto.cc:3811: error: ‘DH_check_pub_key’ was not declared in this scope 
../src/node_crypto.cc:3814: error: ‘DH_CHECK_PUBKEY_TOO_SMALL’ was not declared in this scope 
../src/node_crypto.cc:3817: error: ‘DH_CHECK_PUBKEY_TOO_LARGE’ was not declared in this scope

If you forget to build a shared version then node will almost compile, but complain about missing symbols at link time for the final binary.

Step 4. Build node using the changes you've made:

./configure --openssl-includes=/usr/local/ssl/include/ --openssl-libpath=/usr/local/ssl/lib/
make
make install

Now hopefully you'll have a running version of node installed.

Javascript objects are not hashes

2012-02-09T14:50:00.001+10:30

I've some across several posts from people warning about using objects as maps( [1], [2] ), and while they're right I think they miss an important feature. Objects are not hashes. Instead they try to show you how to use them like objects.

The key to the whole issue is the difference in behavior of in and hasOwnProperty.

For example if I want to set a property in an object using a user supplied string I would do this:

This makes it clear that assigning & updating a field is a ternary (append/update/fail) rather than a binary issue (append/update).

If you're feeling paranoid about already broken data, then you might change the use of posts.hasOwnProperty to Object.prototype.hasOwnProperty.call(posts,slug). But this code should prevent hasOwnProperty from getting overwritten in the first place.

Examining multiple cores with gdb and grep

2011-07-05T09:38:00.001+09:30

So you've got a bunch of core files from a failing application. You look in one of them and find what's causing that particular error. However you're not sure whether all the cores are caused by the one same issue.

The application I was having trouble with was a multithreaded app, so I couldn't rely on the stacs being identical every time... so this is what I did.

stackdumper.gdb Then in bash:

In my case I found that string in all 80 cores, so I know that was the only issue.

Javascript databindings with observers

2011-06-02T21:43:00.002+09:30

This is not quite as clear as the previous post, but heres a version of my javascrip prototyping with observers on the data. Again you can play with it using jsfiddle .. or heres the code.

Creating data bindings using javascript closures

2011-06-02T13:33:00.001+09:30

I've been working on javascript codebase that stores all its data in forms, hiding and showing all the form elements as the view changes. This has worked well while the structure of the data was relatively rigid. But now we've got dynamically structured (tree-like) data we need to be able to grow and shrink the views in more flexible ways.

While I'm not a gung-ho MVC advocate, it certainly looked like the app could do with a bit of an MVC style cleanup (the data was in the view rather than being accessed by the view).

The problem was it wasn't clear how to get the data out of the view... here's a simplified version of the method I'm planning to use.

You can also play with this using jsfiddle

I'm thinking about implementing broadcasting of changes and more model based approach later... but maybe that'll have to be another post.

Gits prepare-commit-message hook

2011-05-18T13:47:00.000+09:30

So I often write bad commit messages. At best they're inconsistent - sometimes with ticket ids at the start sometimes with ticket ids at the end. Sometimes only a short message, sometimes a nicely formatted bullet point list.

Thankfully git has a tool to help you get these things consistent. Its called the "prepare-commit-msg hook"

I wrote a little python script to make a default commit message. Its not perfect but it should help... It takes the branch name, searches for a version tag and removes it, then searches for anything that might be a ticket id and adds it to the message. Finally it adds some boilerplate.

Command line git trees

2011-05-18T13:37:00.001+09:30

Often I'm stuck in a terminal and want to see the commit history in a nice tree format. There's a nice solution ... git log recently learnt the "--graph" feature.

You can get pretty nice results from "--graph --oneline"

But using the tricks from http://www.jukie.net/bart/blog/pimping-out-git-log we can get a much nicer result - including author and branches and time all coloured nicely

Rendering maths from a web page

2011-04-29T18:58:00.001+09:30

Occasionally I do pretty math intensive coding.
And one of the best pieces of software for writing about maths is TeX.
It was written by one of the gurus/fathers of modern computer science in 1978, and is still in use all over the world. So when it comes to putting math on the web, it used to be a workflow like this

render math using tex
copy and paste somewhere
add an image link to your docs.

However some clever people have made that process so much easier: Two efforts in particular look very nice

Heres some mathjax : $ \nabla \times \vec{\rho} = 0 $
Heres the same thing using the google api:

I think the mathjax is nicer, but the google charts version is easier to integrate.

Using saru with googlemock and googletest

2011-04-01T19:40:00.001+10:30

Saru (earlier blog post) is my little testing framework. It does everything we need. It's been used in several serious software development situations. But sometimes things are a little painful.

For example, saru doesn't come with any nice mocking helpers for C++. There is a basic C++ testing library that comes with it, but writing your own mocks by hand is one of those painful things I mentioned.

Thankfully there's a nice mocking library for C++ from the folks at google, called oddly enough, googlemock. However googlemock is designed to work with googletest - the google testing framework.

Googletest is also great. But its orthogonal to saru, rather than competitive. Saru is cross-language and designed to be more of a test-running wrapper, while googletest is a c++ unit testing library.

So I had three options if I wanted to use google-mock with my code and saru.

Make the google-mocks work with the saru-cxx library.
Make google-test output in a format that saru could digest.
Make saru able to parse google-test output.

IMO the third is the wisest and most extensible option. Luckily the changes were pretty easy.

So now getting a google-test file working in saru is as trivial as adding a

\\SARU : Format gtest

to the top of the test file.. and everything just works :)
(Well you'll need to make sure the compiler can find the right includes and the gtest library.. but thats all)

For example I get this kind of output when running a test suite.

99-misc-00-periodic-processor.cpp::TestFixture::test_process_many : OK
gmock_test.cpp::PartyTest.CallsDance : OK
gmock_test.cpp::PartyTest.CallsDanceFails : FAILED
==MESSAGE==

==STDERR==
gmock_test.cpp:69: Failure
Value of: p.party()
  Actual: false
Expected: true

Introducing the fastjson library

2011-03-02T10:00:00.002+10:30

So using my poor mans sampler (and the awesome shark tool on OS X) we found that my application spends a lot (90%) of its time serializing and deserializing json. That's "not good".

I've had a lot of pain finding a good json library for C or C++. Some are one-way (encode or decode only). Some wont handle large numbers (uint64_t at least). Some can't handle unicode.

The boost version based on boot::property tree looked very promising at first. But it has a lot of nasty edge cases. (Unicode is broken unless you're using wchar_t, value nodes lose their type and become strings, certain edge cases wont serialize properly). We managed to work around all these issues, patching the boost libraries and putting hacks into our code. But the insurmountable problem is that it is SLOW.

Now I could look for another library and work around its idiosyncrasies, but boost is already the second library that we've tried to work around.. and none of the other libraries look promising (or have licenses we can work with...)

What can you do when one of your core libraries is causing most of your pain. (Bugs, performance, hacks) - rewrite it ;)

This is not a path we took lightly. But we have an alpha/beta working version of the library, and its about to go into our live servers. The code is a bit disorganized and could do with a bit of a clean up.. but our goals have been reached. It seems to parse and write json about 20x faster than our corresponding boost wrapped code. It performs far less allocations. It doesn't throw away type information. It does support arbitrary sized numbers. It does the right thing with unicode (except one edge case of properly converting some UTF8 into surrogate escaped UTF16 pairs).

Anyway I'll do a proper post on how to use it soon, but for now here's the github link:

https://github.com/mikeando/fastjson

I'd love people to play with it and leave feedback.

There's also a post about it on my company blog.

Poor-man sampler saves the day again.

2011-02-14T14:08:00.001+10:30

So one of my applications was being very slow to start up when running in a VM.
Running the poormans sampler detailed in my previous post. I found that I got a heap of samples looking like this:

Thread 1 (Thread 0xb6f436f0 (LWP 1657)):
#0 0xb7828430 in __kernel_vsyscall ()
#1 0xb7370f93 in read () from /lib/tls/i686/cmov/libc.so.6
#2 0xb731aedb in _IO_file_underflow () from /lib/tls/i686/cmov/libc.so.6
#3 0xb731dcc8 in __underflow () from /lib/tls/i686/cmov/libc.so.6
#4 0xb731a888 in ?? () from /lib/tls/i686/cmov/libc.so.6
#5 0xb731c7b8 in _IO_sgetn () from /lib/tls/i686/cmov/libc.so.6
#6 0xb73103be in fread () from /lib/tls/i686/cmov/libc.so.6
#7 0x0811028f in main (argc=2, argv=0xbffccf54) at src/server.cpp:252

That is I was waiting for a read to complete. Sometimes for 5-10 seconds, sometimes for a few minutes.
What was this troublesome read? An unexpected socket call? Nope - something I would never have expected. Here's the "offending" code.

FILE * f = fopen("/dev/random", "r");
unsigned seed;
fread( &seed, sizeof(seed), 1, f);
fclose(f);
srandom(seed);

This code seeds the random number generators using some values pulled from the random device /dev/random.

Reading from /dev/random blocks until there is enough entropy in its internal entropy pool to complete the read. On an isolated VM there's not much system noise generating entropy and so the pool was emptying quickly.

Turns out theres a non-blocking random device that uses feedback of hashed values to prevent blocking when the entropy pool is low, so switching this to using "/dev/urandom" and all was OK.

Now maybe I could have found this by doing some traditional debugging, (maybe not as that may have generated enough noise to stop /dev/random blocking!) But using dumps from gdb the problem was found in a few minutes of time.

Poor mans sampling profiler for live processes using GDB

2011-02-06T15:20:00.000+10:30

While working on OSX I've got used to having the shark system profiler at my fingertips. I love being able to see what's going on in a live process, where all the threads are stuck. and what's taking up all the time on my system.

On linux you can use the oprofile kernel module, or the commercial zoom profiler (which uses a modifier oprofile under the hood I believe)

However if these aren't available to you then you can attach to your process using gdb and manually CTRL-C and backtrace / continue to get a feel for what's going on. This is suggested in several posts on stackoverflow( here and here )

A neater way to do this without ever pausing the application is..

gdb -batch -x stackdumper.gdb ./a.out 123456 > stack.0

where ./a.out is the binary you are interested and 123456 is the PID.

If you set stackdumper.gdb to contain

thread apply all backtrace

Then you'll get a backtrace on all threads. The advantage of this over the manual method is that the binary is stopped for as little time as possible.

I used this to find that all our threads were waiting on some JSON writing code that should have been fast.
i.e. a sample of about 10 runs of the sampler showed one thread deep in json decoding and 2-7 other threads all waiting in pthread mutex / condition code.

Lean Development : Focus on Learning

2011-01-27T15:01:00.001+10:30

So continuing on with my notes on Lean Development, we're going to look at point 3 from Tom Poppendiecks business card.

3. Focus on Learning - Scientific Method; Challenge Standards; Feedback; Continuous Improvement

Lets start from the back of the list and work to the front, as I find that to be the logical progression of the sub-points.

Continuous Improvement is about continually tweaking your production pipeline and/or the product so that you become more efficient in producing what the customer wants.

Feedback is how you evaluate the changes you make to your pipeline. If your metrics improve, then accept the change, if not work out why it didn't and undo the change.

Challenge Standards - This is a cultural attitude of not accepting the status quo. Its the need for the people involved to have the desire and ability to change the system. If something is a standard then there needs to be a reason for why. If not its arbitrary and subject to getting changed (if the change can be demonstrated to be beneficial)

Scientific Method - This is how we go about getting change implemented. Firstly you hypothesise measurable results from the change will do, then you make the change. If what you hypothesised didn't happen then you need to undo it and reevaluate. "Switching to a frooble compiler should reduce test execution times by 50%, increasing throughput by 3 story-points per week". Results and experiments should be documented somewhere so that the company can learn from what was done (positive or negative).

If we do these things we will understand how our process works and will have a system so that the process can change to meet the changing environment in which the development pipeline exists.

Lean Development : Build Quality In

2011-01-24T14:55:00.002+10:30

This is the fourth post in my mini-series about Lean Development.

The bullet point I'm going to be writing about today is:

Mistake-Proof with TDD; Write No New Legacy; Continuous Integration

TDD is "Test Driven Design". This means not only testing all your code, but writing your tests before you code. And letting your design be driven by the issues that arise while making these tests pass. This also holds for bug fixing -- create a test case that reproduces and narrows down the bug, then make that test pass.

"Write No New Legacy" means don't write code that is hard to maintain. This means modular and easily testable code. Make sure the hard bits of the code are documented.

Continuous Integration means that your code should be being built and tested on every check in. Broken builds should be addressed immediately. Implicit in this is some kind of version control system. Often this is a larger batch of tests than the specific unit / functional tests used while designing a single feature. Ideally this will run on multiple test systems, for each system you deploy to.

One idea behind these points is to make it hard for an unnoticed error to reach production. TDD makes it difficult to make the error in the first place. "Write No New Legacy" means you shouldn't have tricky untestable code paths to trip you up. "Continuous Integration" means you should never have a "broken system" on your hands.

The bigger idea behind these points is continuous improvement of the code-base. If all your tests from TDD get plugged into the continuous integration tests, then you should never have a bug reappear.This should give you confidence that your code is doing what it is supposed to do.

The final point is that these processes make it fast to track down a bug, less time hunting bugs means more time delivering real product.

Why use saru for testing?

2011-01-24T11:25:00.001+10:30

I've had a few questions about why I use my own little testing framework, saru, for testing, rather than using something standard like cxxunit or whatever.

One of the key problems is testing across multiple languages. We use C++, PHP and python for various pieces of our pipeline. Saru was designed so that plugging in a new language is easy. So if someday we need to support Java then I'm not worried. Now we could use cxxunit for C++, phpunit for php, etc, but I like to have unified reporting and a bit more integration of these tests.

Another problem I have with using a pure C++ framework is sometimes I want to test the condition "Class Foo should not be default constructible." The easiest way for me to check that is to have a fragment of code that should fail to compile with given error messages. You can't test that with a C++ testing framework. (In this particular case you may be able to do something with SFINAE style template hackery, or fork and call a compiler, but these just feel very hackish)

My third reason for using saru is that sometimes I want to do things that are not easily expressed in C++, but easily expressed in other languages. Things like "are the source files correctly formatted?", or "do all C++ exceptions have a corresponding PHP handler?", or "are there tests for each of the classes in this directory?". These are all C++ related questions, but much easier expressed in bash using awk/sed/grep or python than in pure C++.

So thats why I use saru.

Lean Development : Eliminate Waste II

2011-01-18T21:31:00.002+10:30

This is going to be another short post. Luckily we only have three points to cover...

The remaining 3 points from my earlier post on the first principle of lean software development are:

Over Processing
Over Production
Defects and Rework

Over processing is doing more work on a product than is required. For software this may mean performance tuning code that isn't performance critical, handling edge cases that can never occur, making code overly flexible. All these kinds of things waste time. Now the tricky part is working out what the "over" part is. Thankfully this issue is addressed in an agile style development system where you get the roughest possible solution out to users and tweak it with their feedback. If the users aren't complaining then you don't need to fix it.

Over production in a production environment, means producing more stock than the consumer will consume or at a rate that leads to a build up of inventory. In a software situation I consider the features to be analogous to the product. If you are producing features that are not needed by the customer then you are over producing. In a software sense this is very similar to the over-processing.

Defects and Rework, in my experience, are where a large chunk of waste in a traditional software development lie. I seem to spend a lot of time fixing bugs. So what do we do about it? Test Driven Design, Continuous Integration and "Stop the Line" are some of the tools advocated by Lean software. I guess I'll have to write about some of them in another post.

As I said this was going to be a short one. I'll move on to the next of the principles tomorrow...

Lean Development : Eliminate Waste

2011-01-17T21:19:00.002+10:30

This post has taken longer than I'd hoped, and is consequently less polished than I'd hoped... but here goes...

The first entry in our list of Lean Development concepts shows leans focus on efficiency:

Eliminate Waste - No Extra Features, Churn or Boundaries

Before we can eliminate waste we must first understand what waste is. I think of this concept as "Don't do what you don't need to do" and waste is any resource use that is not driving your core business.

Here is a list of 7 deadly wastes - which I believe come from the original Toyota lean methodology.

Transportation
Inventory
Motion
Waiting
Over-Processing
Over-Production
Defects and Rework

In most of these cases the resource that is being wasted is time.

I'm going to cover the first 3 today, and more tomorrow.

Transporting a commodity from A to B takes time and may have a monetary cost. But in a digital world this is less applicable. However if you think of this as meaning inefficiencies in your supply chain this may make more sense. How long does it take you to get a "ready" version of the code out to your customers? How painful is this process? How robust? What are you doing in the process that can be automated or removed? Any manual interaction that is not needed is wasted effort. Any unneeded delays are wasted time.

Inventory in a software sense is completed features not shipped to the users. A feature is not providing value to a customer when it is sitting in your development version of the software. The sooner you can get a completed (and tested!) feature out to you users, the sooner it provides value to them, and thus dirves value for you. Of course if every push to the users is taking a couple of days of effort then you have a transport issue that needs fixing first.

Motion, to me means one developer (or task) doing unnecessary actions as part of its progress through the conceptual pipe from idea to implementation. These often crop up as bureaucracy - paper work that will be discarded, double entry into multiple bug tracking systems, emails to supervisors, stuff that makes a developer busy but not productive. This is wasted effort and time.

Waiting.. this usually means waiting for feedback from another party, or waiting for compilation, or waiting for tests to run. The developer ends up doing nothing productive, or is less-productive due to context switching between tasks. So parallelize or speed up the process. Tests should be near instantaneous - your full test suite takes an hour to run - split it up across 10 machines and it should take 6 minutes. Compilation should be fast (and incremental) - use ccache, distcc or something like that. Feedback should be fast or at least predictable.

Well thats it for today, as I said its a bit choppy butt hopefully you got something out of it.

Lean Development

2011-01-16T10:32:00.001+10:30

I've been looking for something I could write a series of posts about .. and I stumbled across Tom Poppendieck's business card. An odd thing to blog about you might think, but on the back of the card are 7 core ideas of Lean software development. I'm just going to repeat them here, then elaborate on what they mean to me. Lean is a style of development that I've been trying to move towards it focuses heavily on efficiency, blending ideas from Agile Development with ideas learn from large scale manufacturing from Japan. An odd combination, but it feels right to me.

So here's the list verbatim from Tom's card:

1. Eliminate Waste - No Extra Features, Churn or Boundaries
2. Build Quality In - Mistake-Proof with TDD; Write No New Legacy; Continuous Integration
3. Focus on Learning - Scientific Method; Challenge Standards; Feedback; Continuous Improvement
4. Defer Commitment - Break Dependencies; Maintain Options; Irreversible Decisions at Last Responsible Moment
5. Deliver Fast - Low Cost and Quality and Speed; Queuing Theory: Flow, Limit Work to Capacity
6. Respect People - Pride, Commitment, Trust and Applause; Effective Leadership; Respect Partners
7. Optimize the While - Measure Up, Avoid Sub-Optimization; While Value Stream & Whole Product

Oh and just a note, any mistakes here are purely my own, as are any opinions presented. But credit for the ideas really belongs elsewhere.