Undefined behavior remains an elusive subject. On the one hand, it potentially exposes your program to dangerous situations and exploitations. On the other hand, it enables the speed and portability that the C programming language is well known for.
Let’s look at the most mundane example: integer overflow. Integer overflow happens if you add two numbers and the result is too large to fit into the number of bits the processor has reserved for it. A program with undefined behavior does not have to give a warning. Neither does it have to crash. In reality it could do anything. Yet the average program performs an incredible amount of integer arithmetic, so the potential for undefined behavior is truly staggering.
In the Java programming language, which attempts to ensure that any valid program construct is well-defined, integer overflow is defined to behave as wrap-around (modular) arithmetic – just forget about the excess bits. That is a bad trade-off, because it chooses speed over the dignified, but incredibly slow, throwing of an exception. The designers of Java must have reasoned that the cost of checking for overflow at every integer computation is too high. In addition, wrap-around arithmetic appears to be free, because for today’s processors it is the default behavior.
Although wrap-around arithmetic appears to be free, in reality it is not free at all. When performing compiler optimizations, for example, it is much more preferable to reason with normal arithmetic than with wrap-around arithmetic. With normal arithmetic, the value of i+1 is always greater than the value of i. With wrap-around arithmetic this is not always true. As a result, certain perfectly valid optimizations are not possible, resulting in the program running slower than needed. In addition, wrap-around arithmetic makes static analysis of, and formal reasoning about, the properties of a program much more difficult. In fact, it stands in the way of showing program correctness.
The more recent Zig and Rust programming languages have made a better choice than Java. In these languages, integer overflow is undefined, just as it is in C. The reason is that it does not matter whether the result is a wrap-around value or undefined. If you failed to anticipate an integer overflow, the program is equally wrong in both cases. If you do intend to use wrap-around arithmetic, you can easily write that down explicitly without relying on a general arithmetic behavior imposed by the programming language.
So perhaps some choices in C are not so bad after all. The wonderful thing about C is that its quirky behaviors are well known and there are multiple guidelines and an incredible ecosystem of tools that help you avoid them. In SuperTest we have specific test suites to validate a compiler’s correct implementation of C’s arithmetic, no matter what the underlying data model is.
Dr. Marcel Beemster, CTO
Subscribe to our monthly blog!