Undefined behavior can literally erase your hard disk
Contents
It is no secret that C and C++ are chock-full of pitfalls of undefined behavior. The C++ standard describes it as:
behavior for which this International Standard imposes no requirements
Modern compilers, with optimizations on, often assume that undefined behavior is never invoked in the program and try to make sense of the code under that assumption. When a program actually does invoke undefined behavior, this conflicts with the assumption under which the compiler generated code, often resulting in strange, or even outright paradoxical results. (I like to call those cases “paranormal behavior”, but it still hasn’t caught on.)
An example
When warning newbies against code that invokes undefined behavior, some people like to say that the compiler is free to do anything, including “making demons come out of your nose” or “emitting code that erases your hard disk”.
Now, while the summoning of demons is a non-trivial I/O operation that would rarely be accidentally emitted by production compilers, Twitter user @andreasdotorg shared a concise example where undefined behavior literally erases your hard disk (if it’s run under a GNU/Linux OS and with appropriate privileges):
— andreasdotorg (@andreasdotorg) September 1, 2017
I found it interesting, but I sent it to a few people and they didn’t quite get it — so I’ll go through it step-by-step here.
The code
The source code is as follows:
|
|
A quick explanation of what we’re looking at:
- We have a function
EraseAll()
that will erase your hard disk when it gets called. - We have a function pointer
Do
that is not explicitly initialized at the beginning of the program. - We have a function
NeverCalled()
that would have initializedDo
to point toEraseAll()
.
By examining main()
, we can easily deduce that NeverCalled()
is indeed never called, therefore Do
remains uninitialized throughout the entire execution of the program. With that in mind, it is perfectly legal for the compiler to throw away the functions NeverCalled()
and EraseAll()
from the resulting binary, because they are both unreachable by valid code.
gcc
The GNU Compiler Collection frontend gcc (let’s say 5.1) with -Os -std=c++11 -Wall
emits relatively straightforward code:
|
|
The unreachable functions EraseAll()
and NeverCalled()
did not actually get removed, but otherwise the entire listing seems very predictable.
clang
Now, if we compile this using the LLVM C++11 frontend clang 3.4.1 (and onwards) with -Os -std=c++11 -Wall
, we get this assembly listing instead:
|
|
In C++, this would be:
|
|
Strangely, not only did EraseAll()
not get thrown away, it actually got inlined in main()
. In other words, our entire program got replaced by a function that should, by all means, have been unreachable.
According to the C++ standard, the function pointer’s value is a null pointer value initially:
If constant initialization is not performed, a variable with static storage duration or thread storage duration is zero-initialized.
We know that attempting to use a pointer whose value is set to a null value invokes undefined behavior. As we said in the beginning, an optimizing compiler may try to work under the assumption that there are no constructs that invoke undefined behavior in the program. Let’s follow that logic and see where it gets us.
- If there is no undefined behavior, then either
Do
is not used in this program, orDo
is not null at the moment when it’s used. Most compilers would pick the second assumption. - In order for
Do
to not be null, it must have been initialized with some other value. - The only thing that initializes
Do
is the functionNeverCalled()
. - In a well-formed program, whenever and wherever
Do()
is called for the first time,NeverCalled()
must have been called beforehand. Do()
is called at the beginning of the program.- Thus, the effects of
NeverCalled()
must be in place beforeDo()
, therefore before the beginning of the program. - There is no other value ever assigned to
Do
, thereforeNeverCalled()
is idempotent in this program. NeverCalled()
must be implicitly “called” before the beginning of the program, and any other call to it can be a no-op.EraseAll()
can be inlined at all places whereDo()
is called. This means that we can conveniently have our hard disk erased without even paying the performance penalty of an extra function call!
And that’s the line of reasoning that could lead a very intelligent and perfectly standards-compliant compiler to produce this hazardous binary.
Epilogue
This was a nice little demonstration of how undefined behavior can bite you under the correct circumstances: it can actually replace your program with code that should have been unreachable!
I also find it interesting that clang performs this dramatic transformation, while gcc does not. Hopefully, in a future post we will take a closer look at the model of undefined behavior employed throughout LLVM.
Author Theodoros Chatzigiannakis
LastMod 2017-09-04