Up-front performance optimization / bit fiddling

Have you ever seen code like this in your embedded project ?

uint32_t numberOfHighBits =  ((someValue & 0xfff) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
numberOfHighBits += (((someValue & 0xfff000) >> 12) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
numberOfHighBits += ((someValue >> 24) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;

In high performance code ? Or in time critical device drivers ? I mostly saw this kind of code in places that were not time critical at all. Even on slow processors (like on FPGA soft cores for example) the savings of this code style in runtime measures are in most places of the code neglectible. Not neglectible are the higher costs for software maintenance which becomes obvious when we look at a better readable alternative:

uint32_t numberOfHighBits=0;
for(uint32_t bitPos=0; bitPos<sizeof(someValue)*8; bitPos++)
  if(isBitSet(someValue, bitPos))

The basic problem on this topic is that not few embedded developers – esp. the ones who programmed extremely resource restricted systems over years and years in the past – are doing an up-front performance optimization. I found some to be very proud of this behavior, they feel that it makes them a better programmer.

Up-front performance optimization can also be found in other topics. Some embedded programmers don’t use object orientation because of the performance and memory impact of a V-Table. Off course there are systems that are so resource restricted, that a V-Table is not possible. (Object-orientation still is, by the way …). But on todays embedded systems this kind of restrictions usually don’t exist anymore.

Also your recent gigaherz ARM controller will be brought to it’s limits in some years, when the product manager lets you adds more and more features. It allways was this way and it allways will be. But then this will not be related to V-Tables or bit hacks. Nor would bad programming style have saved you from it.

So, how can you support that your developers don’t write useless bit hacks for your brand new gigaherz ARM controller ? I would suggest that you encourage the development team not to do any up-front performance optimizations in the code at all. At least when it is not 100%ly clear that a performance optimization is necessary. Especially if the topic is not architecture related at all and only the way the code is written is affected, like in the example above.

Well, will everyone stick to this rule ? Not off course. You will still be arguing why clean code is sometimes worth more than a piece of code that is hard to read but might be faster in the execution. Let me make a suggestion for this case:

Set up a performance profiler on every single workstation/target that can be started as easy as possible. Set up a Wiki-Page that gives clear information with screenshots how to profile. The costs for learning how to profile and for executing a profiling session must be as few as possible.

Then, when a developer applied an unnecessary optimization, it can be shown with ease wether it is benefitial for the runtime behavior at all. It might happen that written code will not be rewritten when a developer notices that her/his performance optimization was useless. But the developer will learn from this profiling-event and the probability for the next useless up-front optimization will decrease …

In some circumstances, however, like an interrupt handler or in nested loops of high performance code – or when the profiler prooved that in a particular place it is necessary – you will still have to and want to use bit hacks. Have a look at this awesome website in this case, it list up lots of examples. Bit Twiddling Hacks