Yesterday I was given a task to help improve the performance of a feature. I was told that the feature was written in a very straight forward way, nearly no optimizations taken. The problem was that it took several minutes (yes, minutes) to run, rather than a few seconds or better than that.
The story is quite long, so if you have time and patience read its whole. If not or you are just too lazy, skip to the summary at the end
Identifying The Problem
The code computed visible area from a given geographic point and DTM. The GIS package used was the one of ESRI. While ESRI tools must have that feature hidden somewhere, for several reasons, including performance, it was not used. However, the result - performance wise – was no better.
Using Sysinternals Suite ProcExp I wanted to see just how much CPU the process is using. I saw that the process was using a total of about 70%. That’s a lot, but the interesting part was that about half of it was actually Kernel Time. This usually means a lot of IO.
So I launched ProcMon to see what happens during the progress. It seemed that for every height sample of the DTM in a given coordinate, the ESRI package accessed the actual DTM file (DTM GeoTiff in that case).
Solving The Problem – Phase 1
I looked deep into the ESRI API to figure out how to prevent that from happening. Since I couldn’t find any descent way of getting ESRI to do it for me, I just used another ESRI API that reads all the data to the memory. While this requires me to build a big memory manager (I didn’t, by the way, just loaded everything to memory. The files are currently small enough to fit. But I’ll get to it…) it is better than accessing the disk for every request.
Joyful and happy that I’ve eliminated the constant file access, I ran the program again. A very minor improvement was shown, and still - too much kernel time.
Solving The Problem – Phase 2
Taking another look on ProcMon, showed a lot of registry access. By asking around (well, asking my supervisor…) I was told that this is due to COM object access. “Strange”, I thought, “I thought I got rid of all COM objects access…”
So the next step was to really get rid of all COM object access during that process. I tracked down several points that it was easy – the value never changes, so just put it in a primitive type member and we’re done. The hard part was to replace the calls to methods that computed something. However, those computations weren’t so hard to do after all. The converted map coordinates to pixel location value in the DTM. Using some linear algebra at its lowest level gave me a satisfying result. Now the program takes seconds instead of minutes.
To sum things up:
- If using COM objects, try to minimize or even eliminate calls to them when doing something that requires performance, especially if in a tight loop.
- Don’t be afraid to “re-invent the wheel” if your current package is too slow. Either find a faster package, or write simple tasks yourself.
- After doing this, make sure the results are satisfactory. Not only performance wise, but also in value. You don’t want to screw things up just to gain more speed.
And in a more general note – improving performance (also true for efficiency) is all about getting the same effect with less work. In this case (and most of the other cases you would bump into), this can be done by a couple of simple replacements of one CPU killer method with another one, less costly. Only rarely the entire algorithm would need to be replaced. Knowing the common pitfalls (or hogs) is a good place to start.