Should we Optimize Code?


Programming is a complex and relative new science, one would expect it to have a lot of misconceptions and myths. The problem is proving something is a myth and knowing which ideas to question.  Recently I've heard statements like: "Memory is cheap, size optimizations are not important". and "Assembly optimization is complex and it is best to study the AMD and Intel manuals". Both of these statements give me the willies. They both appear reasonable on the surface, but there is this nagging feeling something isn't right. OK, so feelings are not worth much, do we have any facts?

Since I write a lot of small utilities, this seemed like a good place to begin. First, I timed two sort programs. The standard GNU sort (quick sort) and the example sort program in AsmIDE. This produced the following table:


program  name
short sort timing long sort timing
asm sort
172ms
200ms
GNU sort
173ms
231ms

These are hard facts that anyone can verify. They also provide a lot of information. First we can determine the operating system overhead by looking at the short-sort times. The short-sort sorts 3 lines and all the time is essentially due to operating system start up. This includes loading the program, setting up memory, handling file i/o etc.  We now know the operating system hogs most of the time for small utilities. A sort program makes heavy use of the CPU which can be seen from the long-sort times. This gives us a nice way to compare programs, and languages. In this case one might assume the assembly program is about 20 percent faster. Wrong, it is about twice as fast.

To compare the two programs, first we need to remove the operation system overhead. This leaves the time spent in the program. In this case we see that the GNU sort used 58ms and the assembly sort about 23. Humm, I wonder if how many benchmarks make this correction? We are getting a little off track, back to the question of size optimization.

We don't know how much program size impacts these times. It appears the asm program has 1ms less overhead at start up. This could be due to a slightly smaller size. Not a big number, but what happens if we run the test immediately after boot (before the cache is filled). Bingo, both programs run slower and the slightly smaller assembler picks up another 2ms.

How about big programs, would they run even slower because of start up time? Obviously, they would. This was easily verified by running some programs with only a help message displayed. The times showed a strong relationship to size.

We have just proven utilities that are swapped in and out of memory can be optimized for speed by reducing their size. Seems obvious.. but what about the statement "memory is cheap and optimizing for size unimportant". It is probably true for device drivers and compiler writers, but doesn't fit our test cases very well.

OK, so how about optimizing using the AMD and Intel manuals? I do know the assembler sort was not optimized and might even be considered brain dead. About the only optimization was use of the merge sort algorithm. The disk i/o is terrible. Still, it is twice as fast as the GNU program. This is a fact, but possibly the GNU program requires more time to handle the extra features it provides? We can't prove much about CPU optimization with one test case. This is a tough nut, because programmers have different styles. About all we can do is gather a lot of statistics and see if a pattern emerges.

So.. off I went to gather some statistics. I know that most compilers do CPU optimization and I know of one set of programs that do zero CPU optimization (everything I've written). We also have a few asm programs optimized for size. The e3 editor and a few others at Linux-Assembly were available. The windows platform has a lot more, but for simplicity lets stay with Linux.

The results were messy but all the assembly programs were very fast, and the compiler programs went from awful to almost as good. This data still isn't conclusive, but it does suggest optimizing assembler for speed isn't always important. Just about anything in assembler beats optimized compiler code. It also looks like program size often impacts run time more that speed optimizations and reading manuals.

What do you think?




Fork me on GitHub