b.ling on software development: April 2009

Tuesday, April 28, 2009

A Trick Question With Closures

Given the following code, what do you expect the output to be?

for (int i = 0; i < 100; ++i)
{
    ThreadPool.QueueUserWorkItem(delegate
    {
        Console.WriteLine(i);
    });
}

Keep that answer in your head! Now...what do you expect the output of the following?

int i;
for (i = 0; i < 100; ++i)
{
    ThreadPool.QueueUserWorkItem(delegate
    {
        Console.WriteLine(i);
    });
}

Were your answers the same? Different? Why?

Monday, April 27, 2009

A Simple Thread Pool Implementation (Part 2)

I said last time I'd compare the performance of my super duper simple implementation against the .NET ThreadPool, so here it is! Here's the test code:

ManualResetEvent evt = new ManualResetEvent(false);
int total = 100000;
int count = 0;
DateTime start = DateTime.Now;
for (int i = 0; i < total; ++i)
{
//  ThreadQueue.QueueUserWorkItem(delegate(object obj)
    ThreadPool.QueueUserWorkItem(delegate(object obj)
    {
        Console.WriteLine(obj);
        if (Interlocked.Increment(ref count) == total)
            evt.Set();
    }, i);
}
evt.WaitOne();

Console.WriteLine("Time: {0}ms", (DateTime.Now - start).TotalMilliseconds);
Console.ReadLine();

Here are the initial tests. My CPU is a Core2Duo clocked at 3.6GHz. I also ran a release build outside of the debugger. I set my ThreadQueue to have 25 worker threads. The results were pretty surprising:
ThreadPool: 1515.6444ms ThreadQueue: 5093.8152ms Well, that's interesting. The .NET ThreadPool whooped my butt! How can that be? Let's dig deeper and find out how many threads actually got used. I used ThreadPool.GetMinThreads and ThreadPool.GetMaxThreads, and I got 2-500 for worker threads, and 2-1000 for completion IO threads. Then, I tracked how many threads were actually used, like this:

Dictionary threads = new Dictionary();
// ...snip
    ThreadPool.QueueUserWorkItem(delegate(object obj)
    {
        Console.WriteLine(obj);
        threads[Thread.CurrentThread.ManagedThreadId] = 0;
        if (Interlocked.Increment(ref count) == total)
            evt.Set();
    }, i);
// ...snip
Console.WriteLine("Threads used: {0}", threads.Count);

Surprisingly, only 2 threads were used. So, I set the number of worker threads of my ThreadQueue to 2. Viola! 1437.5184ms, which is just a little under 100ms faster than the ThreadPool. I guess this shows that more threads does not mean better or faster! For fun I set the number of worker threads to 200 and it took 27906.6072ms! There was clearly a lot of locking overhead here...

A Simple Thread Pool Implementation

I've always wanted to do this, and finally I've gotten around to doing it. Here's a super duper simple implementation of a working thread pool. I named in ThreadQueue just so it's clearly discernible from the one provided by the framework.

public static class ThreadQueue
{
    static Queue _queue = new Queue();

    struct WorkItem
    {
        public WaitCallback Worker;
        public object State;
    }

    static ThreadQueue()
    {
        for (int i = 0; i < 25; ++i)
        {
            Thread t = new Thread(ThreadWorker);
            t.IsBackground = true;
            t.Start();
        }
    }

    static void ThreadWorker()
    {
        while (true)
        {
            WorkItem wi;
            lock (_queue)
            {
                while (_queue.Count == 0)
                {
                    Monitor.Wait(_queue);
                }
                wi = _queue.Dequeue();
            }
            wi.Worker(wi.State);
        }
    }

    public static void QueueUserWorkItem(WaitCallback callBack, object state)
    {
        WorkItem wi = new WorkItem();
        wi.Worker = callBack;
        wi.State = state;
        lock (_queue)
        {
            _queue.Enqueue(wi);
            Monitor.Pulse(_queue);
        }
    }
}

As you can see, it's very short and very simple. Basically, if it's not used, no threads are started, so in a sense it is lazy. When you use it for the first time, the static initializer will start up 25 threads, and put them all into waiting state (because the queue will initially be empty). When something needs to be done, it is added to the queue, and then it pulses a waiting thread to perform some work. And just to warn you, if the worker delegate throws an exception it will crash the pool....so if you want to avoid that you will need to wrap the wi.Worker(wi.State) with a try/catch. I guess at this point you may wonder why one should even bother writing a thread pool. For one, it's a great exercise and will probably strengthen your understanding of how to write multithreaded applications. The above thread pool is probably one of the simplest use-cases for Monitor.Pulse and Monitor.Wait, which are crucial for writing high-performance threaded applications. Another reason is that the .NET ThreadPool is optimized for many quick ending tasks. All asynchronous operations are done with the ThreadPool (think BeginInvoke, EndInvoke, BeginRead, EndRead, etc.). It is not well-suited for any operation that takes a long time to complete. MSDN recommends that you use a full-blown thread to do that. Unfortunately, there's a relatively big cost of creating threads, which is why we have thread pools in the first place! Hence, to solve this problem, we can write our own thread pool which contains some alive and waiting threads to performing longer operations, without clogging the .NET ThreadPool. In my next post I'll compare the above implementation's performance against the built-in ThreadPool.

Saturday, April 25, 2009

SourceGear Vault, Part 4, Conclusion

I suppose I should give a disclaimer since I am *not* an expert with Vault, and my opinions may be completely due to my lack of understanding of the system. With that in mind, here's what my experience of using Vault has been so far. In general, it is not as fast as TortoiseSVN. There are many operations in SVN that are instant, where the comparable operation in Vault is met with 10 seconds of "beginning transaction" and "ending transaction", sometimes more. Feature-wise, Vault has much more to offer than Subversion does. Basically, it can do most of what Subversion can do, plus features from SourceSafe (like sharing, pinning, labeling, etc.). However, like I mentioned before, Vault has no offline support whatsoever, and you cannot generate patches, so it effectively cuts off any kind of outside collaboration. You could say that this is fine because SourceGear's target audience is small organizations where everyone will have access to the network anyway, but that doesn't mean that you won't be sent off to the middle of nowhere with no internet access and you need to fix bugs now! Not to say that Subversion is much better in that scenario, but at least you can still revert back to the last updated version.

Friday, April 24, 2009

SourceGear Vault, Part 3 (Performance vs Subversion)

I downloaded the kernel 2.6.29.1, and extracted it to the working folder. I figured this was an easy way to have a real-world scenario of changes (albeit a little big since it is all changes that happened between 29 and 29.1). Anywho, I was pretty surprised to find out that Vault does not have any feature whatsoever to allow collaboration between programmers other than via the client. You cannot create a .patch file and email it to someone. Everyone is assumed to have access to the server. This thwarted my plans to test the performance of diffing the changeset, because I simply planned on comparing how long it would take to create the patch. Ah well, I guess I'll just have to compare the performance of committing the changes between 29 and 29.1, which is a common use case as well so I don't mind. Time it took to a) start up the Vault client, b) bring up the Vault commit dialog, or c) bring up to TortoiseSVN commit dialog: Vault startup: ~1m02s Vault commit: ~7m55s Subversion commit: ~4m33s Time it took to commit: Vault: ~13m45s, and at 14m34s the HD stopped spinning Subversion: ~1m42s Hmmmmm, it doesn't look like Vault did too well in this comparison. Getting a status of all changed files took almost twice as long compared with Subversion, and what's worse, committing with Valut took 12 minutes longer than it did with Subversion. The extra minute with the hard drive spinning was attributed to SQL Server completing the transaction, which is why I separated that part, because as far as the client was concerned the operation was complete after 13m45s. One more quick test...branching. Subversion is famous for its O(1) time to branch anything, from the smallest to the biggest. SourceGear's page of Vault vs Subversion mentions that both offer cheap branches. Let's see that in action! I used TortoiseSVN's repo-browser and branched the kernel. It was practically instant with no wait time for the operation to finish. Vault, on the other hand, took a total of 47 seconds from the time it took me to commit the branch, to when the status said "ready" again.

Thursday, April 23, 2009

SourceGear Vault, Part 2 (Performance vs Subversion)

Obviously you can't compare something without performance numbers! So in this post I'll post some performance numbers for the most common operations. I'll be using the Linux kernel for no reason other than it's readily available and everyone knows about it. The size of the code base is also relatively large (depending on who you ask). In general, I "felt" that Vault is slower than Subversion. This is probably due to the .NET runtime startup time for most operations, which is negligible for anything but trivial operations. Test environment: Core2Duo clocked at 3.6GHz 4G of RAM Vista 64bit/IIS7/SQL2008Express Vault 5.0 Beta 1 (I'm hoping the beta isn't going to affect performance numbers drastically) TortoiseSVN 1.6.1 Linux kernel 2.7.29 (roughly 300MB of source code and more than 26000 files) Both server and client are on the same computer, so none of these scenarios will really test network performance. Anyway, on with the tests. Time it took to recursively add all folders and files and just bring up confirmation dialog Vault: ~14s Subversion: ~11s Time it took to mark all files to add (create a changeset): Vault: 0s (it was pretty much instant) Subversion: ~6m52s Time it took to commit: Vault: ~21m10s Subversion: ~28m50s Total: Vault: ~28m16s Subversion: ~35m53s As you can see, Vault was slightly slower at a trivial operation like which files to add, but once it got to some real work to do, it ended up being faster overall vs Subversion. A significant part of the Subversion time was devoted to creating .svn folders. After this, it was apparent that Subversion has a pretty major advantage over Vault...working offline. Vault does not appear to keep any offline information. This was confirmed when I went into offline mode in Visual Studio and pretty much all functionality was disabled except for "go online." Basically, when you're working offline in Vault you can no longer diff with the original version, no reverting changes, no anything. Once you go online it scans your working copy for any changes and the "pending changes" will update. I don't like super long posts, so there will most definitely be a part 3 where I'll continue on with some common operations.

Wednesday, April 22, 2009

SourceGear Vault, Part 1

I'll be starting a new job soon, and the company is using SourceGear Vault, so I went ahead and downloaded the latest beta versions off the site (since it's free for single users) to see how it was like. I've hated SourceSafe from the first time I used it. The nail in the coffin was when my data got corrupted and some of my work was lost. At the very least, no source control management system should ever lose any work. Unfortunately, my current workplace is still using SourceSafe and I was unable to convince them otherwise. I've coped with the situation by using Mercurial for a quick local repository, and checking in changes back to VSS for major changesets. The choice to go with hg instead of git or bzr was mainly because I'm a Windows developer and hg is currently way ahead of the other two on Windows. Anyways, since Vault is frequently advertised as SourceSafe done right, I was curious to how it would affect my opinion of how things "should" have worked. I'm a long term user of Subversion, and most of my experience of actually using SCM is with svn. I used hg at work just to see what all this DVCS hype is all about. With that in mind, setting up Vault was certainly more eventful then I expected, since it involved setting up SQL Server and IIS. After all the requirements were taken care of (the most time consuming part), installing Vault was pretty quick. The administration interface is simple and straightforward, and had a repository created by default. Then, the real fun began! Tune back for part 2!

Thursday, April 9, 2009

No more cross thread violations!

“Cross thread operation not valid: Control ### accessed from a thread other than the thread it was created.” Seriously, I don’t think there’s a single person who’s written UI programs for .NET that has not encountered this error. Simply put, there is only 1 thread which does drawing on the screen, and if you try to change UI elements in a thread that’s not the UI thread, this exception is thrown. A very common example is a progress bar. Let’s say you’re loading a file which takes a long time to process, so you want to notify the user with a progress bar.

  public event EventHandler ValueProcessed;
  private void StartProcessing() {
    ValueProcessed += ValueProcessedHandler;
    Thread t = new Thread(delegate() {
      for (int i = 0; i < 1000; ++i) {
        // do stuff
        ValueProcessed(this, new IntEventArgs(i));
    });
    t.Start();
  }
  private void ValueProcessedHandler(object sender, IntEventArgs e) {
    _progressBar.Value = e.Value;
  }

I left some implementation details out, but you can pretty much infer what IntEventArgs is, etc. Once you try to set Value, it’ll throw an cross-thread exception. A common pattern to solve this problem is this:

  private void ValueProcessedHandler(object sender, IntEventArgs e) {
    if (this.InvokeRequired) {
      this.BeginInvoke(ValueProcessedHandler, sender, e);
    } else {
      _progressBar.Value = e.Value;
    }
  }

It gets the job done…but frankly I’m too lazy to do that for every single GUI event handler I write. Taking advantage of anonymous delegates we can write a simple wrapper delegate to do this automatically for us.

static class Program {
  public static EventHandler AutoInvoke(EventHandler handler) where T : EventArgs {
    return delegate(object sender, T e) {
      if (Program.MainForm.InvokeRequired) {
        Program.MainForm.BeginInvoke(handler, sender, e);
      } else {
        handler(sender, e);
      }
    };
  }
}

This assumes that you set Program.MainForm to an instance to a Control, typically, as the name implies, the main form of your application. Now, whenever you assign your event handlers, you can simply do this:

    ValueProcessed += Program.AutoInvoke(ValueProcessedHandler);

Pretty neat! BTW, just a word of warning, if you plan on using the same thing to unsubscribe an event with -=, it's not going to work. Unfortunately, calling the AutoInvoke method twice on the same input delegate returns 2 results where .Equals and == will return false. To get around this you can use a Dictionary to cache the input delegate.

Digsby

it's pretty sweet. you can connect to every IM service known to man, and it even checks all your email, facebook, and twitter too with nice popup notifications! it only works for windows at the moment, but the guys over there are working hard and pushing out updates very frequently. i'm alpha testing and even so i haven't seen any crashes or any major problems. there have been the minor quarks here and there, but that's to be expected when you're alpha testing. check it out at digsby.com also, you can embed a chat client into your website...like this:

bling.github.io