What is Undefined Behavior in C ++

Undefined behavior

04 Jan 2019

Although one might believe it, the term “undefined behavior” does not come from politics, but is the opposite of “defined behavior” in programming.

I had my problems with this term for a long time when I switched from the assembler, Basic and Pascal world to C and C ++.

Today I value this term very much and see it as a success of modern compilers.

For example, it is said that access to the memory behind a NULL pointer is forbidden and leads to crashes.
But that's just not true, it's just "undefined behavior." Or to put it another way: Nobody has defined what exactly happens then and therefore a different reaction can occur on each system.

Under today's Windows and Linux releases, the memory around the virtual address “NULL” is never connected to physical memory. The access thus leads to a “general protection violation”, which the program either handles or the operating system simply terminates the process - a classic program crash.

But in DOS times there were no “virtual addresses” and the memory actually started at address “0”.
Access via a NULL pointer was successful and returned the corresponding bytes in it.
But these bytes belong to the global interrupt vector table and if someone wrote their own data into it, the result was literally undefined behavior, because the next interrupt the table was used to derive the address of the code to be executed and the processor then jumped to any address .

However, if someone deliberately wanted to adapt an interrupt routine, this inevitably had to be done using the NULL pointer plus offset and the result was then not a crash, but a correct program flow.

“Defined behavior” are rules of the game that are the same across platforms on all systems. In the case of “undefined behavior”, very special rules apply which can be different on each platform.

As C programmers, we are very interested in writing programs that run almost everywhere, and therefore we usually only want to work with defined behavior.

For example, C recognizes arithmetic operations as "defined" only within the bit limits of a type.
Unfortunately, we are so used to the binary nature of today's processors that we consider overflows to be “defined” even though they are not.

1unsignedint x = (unsignedint) -1;

I have often seen this code that is supposed to set all bits in to 1. On an x86 16-bit system, we would have below 32 bit and with 64 bits. Most C compilers would do it that way, because the binary equivalent maps itself in exactly the same way without any additional effort.
But the compiler should actually set ours to any other value because the behavior is not defined.

If we ever design quantum computers or other computing systems that are not based on today's binary logic, then such an operation may no longer be feasible or require additional instructions.

I have already been asked colleagues why I use arithmetic operations in implementations of Base64 or other byte serializations and not just use type casts that truncate the bits.
And my answer to that was often: because arithmetic operations are precisely defined, bit truncations are not.
And just because everything currently seems to be working correctly, it is not necessarily correct.

Fortunately, I can largely rely on the fact that modern compilers understand my more complex calculation code and that the optimization phase for the platform generates a high-performance machine code (without additional invoices) from it.

But even if not, it always applies to me that correctness comes before speed.
After all, what use is the fastest code if it doesn't produce more.

Conclusion: So friends, always pay attention to what is “defined” according to the language standard and try to avoid “undefined behavior”. Unless you write drivers or OS parts that have to be tailored to very special platforms and compilers.

Raymond Chan gave some nice examples of where compilers alter their code generation when they detect “undefined behavior”.