LivingWithSourceCode

Tuesday, May 17, 2011

The why of unit testing

When you sit down to start writing test code, it's sometimes hard to see the wood for the trees. It's hard to make sense of writing a unit test for a piece of code which is maybe 20 lines long, and you can run it as part of the app and see that it works.

So far so good. It works, who needs tests.

Well, what about after someone else changes the code? Will it still work?

What about after someone changes the code in another branch, and a source code control system automagically merges your code. My change in this branch, your change in that branch, and an automated system decides the changes don't conflict, and puts them both in the merged version.

You can of course check each merged file, say all 40 source code files that you changed this week. You might spot the place where our changes don't play nice. But better still, you can run the unit tests. All of them, hundreds of them built up over time.

And that's the reason why you write unit tests, even for code that's obviously right.

Because things change, other people work on the code, some of them are new to the code base, some of them are new to the language.

So only about 5% of the value of units tests are to make sure the code works now, the real value of the unit tests are to make sure the code still works later.

There's other reasons too, but that's another post.

Thursday, April 21, 2011

Design Patterns - The WHY

Over time I have had the pleasure and indeed on rare occasions the misfortune to interview many software engineers.

One thing that you learn in interviews is how poorly some basic ideas are understood....

Factories
A very common misconception is the humble Factory Class.

A lot of people seem to miss the point. It's often been described as a place to collect together constructors, as if bunching them together in a single file was a goal in itself.

It's the WHY question that matters. Why on earth would you use a factory class?

The answer is quite simple. It's all about reducing dependencies.

It's all about dependencies
If I have a class ShapeManager that understands and cares about shapes, but has no particular need to know about or care about the particular type of a shape, then I want to preserve that separation. I am prepared to go to quite some lengths to prevent that class ever knowing about a Square, Circle or Triangle. As soon as it does know about individual shapes all is lost.

For example, if I suddenly see the error of my ways and change all the Simple Regular Polygons to a single class, my ShapeManager should ideally remain unchanged. I should not have to go into ShapeManger and look for all places where I have a particular instance of Square, Triangle, Pentagon, etc. and change them. That way lies madness.

Loading Shapes
Somewhere it the ShapeManger, I may care to load in Shape from a file. I could code it directly, but then my ShapeManager needs to start knowing about the types of Shape in the file. Madness I tell you.

What if I pass in an IShapeFactory, in this case an instance of ShapesFromAFileFactory()?

Now my ShapeManager sees an Interface which has a Function GetNextShape(), which unsurprisingly returns a Shape.

Now when I add a Simple Regular Polygon, or change the way shapes are stored, or even move the storage of the Shapes to the database, ShapeManager remains unchanged.

And I can also pass in an ShapesForTestingFactory() in my unit tests and run the tests without any external dependence on databases or Files.

The ShapeManager code is now more testable, it's less fragile, and please don't underestimate this part, it's more understandable to people who know about standard design patterns.
That bit matters a lot in a large software team. Depending on schedules, any of a half a dozen engineers with a passing knowledge of an area may be working on a change or a fix. If they see a Factory and know from roughly what is going on, then there's less "figuring out" to do before they can get started.

Monday, April 11, 2011

The State of things

The humble state machine has been implemented many many times, but enums are no longer the way to do it.

At the heart of things, you have some information about our current condition or state, and you have things that can happen, ie Events.

Some Events are valid for our current state, some are not.

If we use an object to record our state, then the object can have member functions to deal with events.

So, if we start with an AbstractState which cares about Enable, Disable, Pause, and Continue events. We give it virtual functions for each of these and in each case, we throw an "Invalid Event For State" exception, detailing the Name of the Current State and the Event.

Our Enabled State, derives from this, and implements Disable, and Pause.

IState Disable() {

return DisabledState; // this is just a static instance.

}

IState Pause() {

return PausedState; // this is just a static instance.

}

If you try to "Continue" an Enabled State, then you get an exception

For each state, you override the functions to deal with the possible events.

To process an Enable event, your code then calls

currentState = currentState.Enable();

And so on.

Do exercise care in deciding that you cannot go from state A to state B. Someone may take a personal dislike to you and all your children based on some artificial restriction that you enforce.

Friday, April 1, 2011

Names ending in 2

It's almost always a wrong thing to have a variable, class or function whose name ends in 2.

class SensorReadingTransform sensorReadingTransform;

class SensorReadingTransform2 sensorReadingTransform2;

Anyone care to make a guess as to the difference between the 2 classes above.

Does this code "speak to you"

Are you going to have to go look in the class to have any idea what the difference is?

If I'd renamed the first one, and created the second one as follows

class SensorReadingLinearTransform sensorReadingLinearTransform;

class SensorReadingNonLinearTransform sensorReadingNonLinearTransform;

Now you have at least a starting point. Yes, that's what I need, look further, or no, that's ok, I need to look somewhere else.

And truly, if the best name you can come up with is "Foo2" then that says a lot about the need for some rethinking.

Thursday, March 31, 2011

Simple Rules for Source Code Control.

Only check in one functional change at a time. Then when someone else has to merge the changes, or to undo one of them, or to cut a release, they will not be forced to question your progeny.

Add function overloads after the original function. Otherwise diff and merge can make an unholy mess of figuring out what changed where. Especially if anything else in the original function changed.

Put in a decent comment in the source code control. Yes someone can look at the files to see what you did, but the poor fool may be merging 6 months of work in one branch with 6 months of work in another. They really don't have time to play detective. You may even end up being that poor fool, and not remember every change that you coded 6 months ago. "Fixed bug 102312" is not an acceptable comment. Do you really want to have to log on to the bug tracking system and chase up each bug by reference number while you sort out an interesting part of a merge.

Don't manually apply the same change to 2 different branches. There is a special place in hell for programmers that do this. Use the Source Code Control System to push your change from one branch to another. Then the SCCS will have a history, and the shared base file will be updated. It may seem like a Royal PITA, but it will not be nearly as much as a PITA as the merge will be if you manually apply the same change to two branches, especially if you have add the functions in different places, or if the formatting is different because when you CUT & PASTE a function, the Environment automagically formats it.

SCCS should be a mandatory class in any degree that has any pretentions to Software Engineering. And one of the keystone projects should involve merging 2 code branches and producing a report of good practices that would make the merge easier. Ideally give each student changes to make to a branch, and randomly pair them and make each student merge to the other branch*.

(*Hey I never said I was a nice person)

Tuesday, March 8, 2011

Let the compiler do the work

If you have data that can be transposed it will surely be.

Look Mizuho, Instead of selling 1 share at ¥610,000, they sold 610,000 shares at ¥1 each.

A function that takes two parameters CalculateArea(double length, double width) will eventually be called with width and length. In this case, for most geometries that we care about, no harm done.

But what about some process function like :

void AddDilutedAcid(double ml, double PartsPerMillion)?

The following will compile, run, and cause all sorts of problems.

void MixUpDangerousChemical()
{
     double ml = getVolumeToAdd();
     double ppm = getConcentration();
     AddDilutedAcid(ppm, ml); // Nicely transposed. Should be ml, ppm
}

It would be nice to find that bug at compile time, not at runtime.

What if we had a simple Concentration class. At it's simplest, it would just contain and expose a double.

public class Concentration{
   public double ppm {get;set;}
}

void AddDilutedAcid(double ml, Concentration PartsPerMillion) {...}

void MixUpDangerousChemical()
{
     double ml = getVolumeToAdd();
     Concentration ppm = getConcentration();
     AddDilutedAcid(ppm, ml); // This throws a compiler error now !!
}

Let the compiler do the work. It always pay attention. Always.

Thursday, March 3, 2011

I'll Just....

If an electrician came to your house, and offered to save you running new wiring by using the fact that the Aluminium Window frames are pretty conductive....

So why do we do the same thing in software?

"I'll just add this feature in here" translates directly to "Man is this ever gonna be a mess in a few years time".

It's an invariant. The words "I'll Just" almost always precede a short term fix that will eventually pile up with all the other "I'll Just" fixes until the whole thing becomes a tangled mess of surprises.

You look in a function and find that it's also doing something else unrelated. "Ah yes, they wanted to be able to do.... so I just added it here"

There's a name for this one, "accidental coupling", and no, that has nothing to do with being drunk in a nightclub. Things get tied together because they happen together by coincidence.

Of course, later someone wants to do them separately. And in code, surprises are almost* never good things.

The PrintInvoice() function also sets the credit limit. Who'd have thought that? "Well we did that because we usually set the credit limit then..."

It was a wrong thing to do, and any and all justifications don't change that. Great, You've just used the Aluminium window Frame to carry the current to the Outside Lights. Nice.

There is a minimum amount of work that is required to do something right. You cannot avoid that work forever. You can put off that work, but when it comes to tidying up all the shortcuts, you may not like the interest and penalties that build up on all the work you should have done up front.

*Almost Never ~ ok, let's just say Never