Thursday 9 April 2009

Tricky recursive loop and power of logging

Recently I was working on some import functionality which was written earlier by third person. I had to move some data between two separate systems and I was assured that tool provided to me was working fine. So, yes it worked, but processing the 1MB Xml file took him about 30 minutes!
God, I know that complicated operations like creating, modifying, managing some big structures can last long, but why it took so incredibly long?

I've started coming to terms with the processing time, but then for completely different reason I have added some logging. It was very simple, I logged just most important things - successful and failed imports. I ran tool, and after another half hour I started to browse my log file. What I've noticed first was that log file had over 8MB, it was surprising, but I left it and started to look for failed updates (which interested me then).

And then flash of inside! Each thing that should be updated and saved once, in fact was updated and saved about hundred of times. So I ran into code and than I found that method:

    protected void ProcessNodes(XmlNodeList nodelist, string culture)

    {

        for (int i = 0; i < nodelist.Count; i++)

        {

            XmlNode node = nodelist[i];

            if (node.Name == "page")

                ProcessNode(node, culture);

            if (node.HasChildNodes)

            {

                for (int i2 = 0; i2 < node.ChildNodes.Count; i2++)

                {

                    if (node.ChildNodes[i2].Name == "page")

                    {

                        ProcessNode(node, culture);

                        ProcessNodes(node.ChildNodes, culture);

                    }

                }

            }

        }

    }


Many of you see it immidietaly, but when you browse somebody's code which has more than five lines and didn't analyze that line by line, then you propably going to miss it. The problem with that code is that its not a infinity loop, its actually works, but really, really slow.
Each element is saved multiple times, in fact as many times as many children it has, and additional one unnecessary time inside the loop. So, for node which has 5 childrens, 5 siblings and which parent has 3 siblings this node will be updated and saved about 75 times!!!
Yeah, that was sneaky:)

I learned two thing from there:

1. Be careful even when you do a trivial things. The easiest things are the most difficult to to find, because normally you would just skip that bit thinking, 'oh, another loop...'.
2. Always log your actions. Simple log file contains more informations than you can image.

Oh, actually there's one more... You should never trust others code:P