bling.github.io

This blog has relocated to bling.github.io.

Saturday, December 11, 2010

Auto Mocking NSubstitute with Castle Windsor

I was debating whether to make this blog post because it’s so damn simple to implement, but hey, if it saves someone else time, I did some good.

First of all, register an ILazyComponentLoader into Windsor:

var c = new WindsorContainer();
c.Register(Component.For<LazyComponentAutoMocker>());

Then, the implementation of LazyComponentAutoMocker is simply this:

public class LazyComponentAutoMocker : ILazyComponentLoader
{
  public IRegistration Load(string key, Type service, IDictionary arguments)
  {
    return Component.For(service).Instance(Substitute.For(new[] { service }, null));
  }
}

And you’re done!  Here’s a simple unit test example using only the code from above:

[Test]
public void IDictionary_Add_Invoked()
{
  var dict = c.Resolve<IDictionary>();
  dict.Add(1, 1);
  dict.Received().Add(1, 1);
}

That was almost too easy.

Sunday, December 5, 2010

Working in Git to Working in Mercurial

I took the dive a couple weeks ago and learned how to use Git and fell in love with its simplicity.  I don’t know what it is, but after using Git every day it actually started to make sense that git checkout is used for so many things, which is ironic because prior to using Git, every introductory tutorial I’ve read has always had me thinking, “checkout does what and what and what now??”.

So why did I switch to Mercurial?  I need to push/pull from a Subversion repository, and I work on Windows.  General day to day work is great, but when I needed to git svn rebase or git svn dcommit it would take so long that I simply left to get coffee.  What’s worse, no matter what I set for core.autocrlf I would always get some weird whitespace merging error, when nothing was wrong.  It became a regular part of my workflow to just git rebase –skip everything because that’s what fixed things.  Scary.

The crappy Subversion and whitespace support (at least in Windows) led me to try Mercurial.  After getting accustomed to Mercurial, they’re actually a lot more similar than they are different.

First things first, you need to add the following to your mercurial.ini file in your home directory:

[bookmarks]
track.current = True

Here’s a comparison of my typical Git workflow translated to Mercurial:

  Git Mercurial
Work on a new feature/bug git checkout –b new_feature hg book new_feature
Hack hack hack git add .  
Commit git commit –m “done!” hg commit –m “done!”
Hack more and commit git commit –a –m “more hacking” hg commit –m “more hacking”
Sync with up stream git pull master hg pull
Merge git checkout master hg merge new_feature
  git merge new_feature hg commit –m “merged”
Push git push hg push

Notice that they’re practically exactly the same?  There are some minor differences.  The obvious one being that you need to add into git’s index before committing.  And the other I didn’t have to switch branches in Mercurial (more on that later).  But aside from that, it’s pretty much the same, from a user’s point of view.  Check this fantastic thread on StackOverflow for one of the best comparisons on the net if you want to dive into technical details.

So far, there are only two things that bother me having switched:
  a) No fast-forward merge.  You need to manually rebase and then commit.
  b) No automatic commit after a conflict free merge.

These are fairly minor annoyances, but can be scripted/aliased away.  Or if you don’t like those defaults in Git, you can override them in the command arguments or set them in gitconfig.  Mercurial similarly can change default arguments in its hgrc/mercurial.ini.

One of the biggest advantages of Git is easily switching between all its local lightweight branches at lightning speed.  Tangled source code is no more!  I was confused with Mercurial’s bookmarks when I first used them because all it does is label your head with something.  I don’t know why this is, but it is the same as git checkout –b, but for some reason I visualized Git “branching” the latest commit into two paths.  But how do you do that if they’re the same?  It should only branch after a commit which introduces changes, which is what Mercurial does.  In this scenario, Mercurial is more explicit, whereas Git is implicit.

Using bookmarks, you can mimic Git’s workflow like this:

  Git Mercurial
Create a bunch of ‘working areas’ git branch feature1 hg book feature1
  git branch feature2 hg book feature2
Switch/commit/hack/commit git checkout feature1 hg up feature1
  git commit –a –m “feature1” hg commit –m “feature1”
  git checkout feature2 hg up feature2
  git commit –a –m “feature1” hg commit –m “feature2”
Sync up stream git pull master hg pull
  git checkout master hg up default
Merge in 1 feature git merge feature1 hg merge feature1
    hg commit –m “merged”
Delete branch/bookmark git branch –d feature1 hg book –d feature1
Push git push hg push –r tip
Switch back and hack again git checkout feature2 hg up feature2

The –r tip switch on hg push might have raised an eyebrow.  This is to tell Mercurial to push all changes that lead to the tip.  This will include the changes in feature1 that we just merged in, but exclude all the ones in feature2.  If you issue a hg push it will complain and warn you that you’re going to create multiple heads on the remote repository.  This is not what you want since there may be unfinished features, experimental branches, etc.  Of course, you can force it anyways, but that’s not a good idea.

At first, the tip was most confusing to me because I tried to associate it with Git’s master, which was simply not the case.  Tip refers to the newest change that you know about.  This can be on any branch or bookmark.  Once I understood that, and stopped trying to create a Git master equivalent, everything was straightforward.

So let’s start with an example.  First I create a bookmark feature1 after syncing, and then making a commit looks like this:

image

And then if you switch to feature2 and make a commit, it becomes like this:

image

Here’s where you start thinking, “I don’t have a master which tracks upstream changes, how do I separate my local changes?”  And now here’s a situation where Mercurial does more black magic than Git.  If there are up stream changes, after issuing hg pull, this happens:

image

Mercurial automatically split my local changes into their own separate branches (actually, heads is the accurate term).  After this operation, tip is now tracking the changes that I just pulled in from upstream, instead of my bookmark.

From here, it’s simply hg up tip to switch to the “master” branch.  Then a hg merge feature1; hg commit, and it becomes like this:

image

Then it’s simply hg push –r tip, and you’re good to go.  Basically, if you bookmark every feature/bug you work on, then you should only have one branch that doesn’t have a bookmark, which is the same as the ‘master’ branch from Git.

What about Subversion?

Whoops, forgot why I switched in the first place.  First, install hgsubversion, after that’s set up, simply:

hg clone http://path/to/subversion/repository

And then it’s just hg pull or hg push.  Is it really that simple?  Yes, yes it is.

Saturday, December 4, 2010

CQRS: Building a “Transactional” Event Store with MongoDB

As you all already know if you’re familiar with MongoDB, is that it does not support transactions.  The closest thing we have is atomic modifications of a single document.

The Event Store in a CQRS architecture has the important responsibility of detecting concurrency violations, where two different sources try to update the same version of the aggregate.  The one that gets it late should be denied changes into the store with an exception thrown.  This ensures the integrity of the data.

Here is a very simple typical implementation of appending events into the event store:

public void Append(Guid id, long expectedVersion, IEnumerable<IEvent> events)
{
  try
  {
    _events.Insert(events.Select(x => ...)); // convert to storage type
  }
  catch (...)
  {
    if (E11000 duplicate key)
      throw new ConcurrencyException(...);
  }
}

Syntax is a mix of C#/pseudo code, but the basic concepts are the same.  This assumes that you’ve set up an multi-index on the collection between the ID and the version.  Thus, when you insert something that already has a matching ID/version, Mongo will tell you of a duplicate key violation, and all is good.

But wait!  Operations are atomic per document!  So what happens if you append 100 events, and it fails on the 43rd one?  Events 1 through 42 will continue to exist in the data store, which is bad news.

Obviously, this solution is not going to work.  The next step was to do something like this:

catch (...)
{
  if (E11000 duplicate keys)
  {
    foreach (var e in events)
      _events.Delete(new { _id = e._id });
 
    throw new ConcurrencyException(...);
  }
}

So, before inserting into the collection, each events gets a generated ObjectID, so that if it fails, the catch exception can simply tell the data store to delete everything.

At first glance this seems to fix everything, except for one glaring problem.  What happens if you lose connection to the database before, or midway sending the deletes?  Now you have a problem of ensuring that those deletes are guaranteed, and so then the question that arises from that is where would you store it?  A local file?  Another database?  The problem is, at that moment, if another process in the system queries all events for the same aggregate it will return invalid data.

So, we’re back to square one.  We need to simulate a transaction through a single insert.

The secret is in the schema design.  Initially, we started out with a straight forward row-per-event schema.  But since we’re operating with documents, we can model it as a batch of events.

Thus, instead of versioning every event individually, we version a batch of events.  For example, originally we would insert 3 events, and the data saved would look like this:

{ _id = 1, aggregate_id = 1, version = 1, event = { … } }
{ _id = 2, aggregate_id = 1, version = 2, event = { … } }
{ _id = 3, aggregate_id = 1, version = 3, event = { … } }

In the new schema, it would look like this:

{ _id = 1, aggregate_id = 1, version = 1, events = [ { … }, { … }, { … }, { … } ] }

Now, a downside to this approach is you lose a bit of granularity of stored events, since you are grouping multiple events under a single version.  However, I don’t see this as a huge loss since the main reason you want to use event sourcing in the first place is to be able to restore an aggregate to any state in its history, and we still retain that functionality.

In our case, this is working very well for us.  When a command gets handled, it generates a bunch of events that get applied and then saved to MongoDB.  I can’t think of any scenario where it’d want to replay to the middle of a half-processed command (but of course it’s possible anyways, just reply half of a batch of events).  But that’s just asking for trouble.  It’s most likely easier to just the re-process the command.

Now, you may be asking why go through the trouble of batching events when you can just store one document per aggregate, and then put all events in one document?  Yes, that would solve the problem very effectively…until you hit the 4MB per document limit ;-)