Today I had a nice WTF moment. I was looking into when
gcc is going to print (null) rather than segfaulting on trying to dereference a NULL pointer. I knew newer gcc versions are doing that in some situations. But it turns out this is more complex than I initially thought.
So here we have 5 little test programs:
#include <stdio.h>
int main(void){
return 0;
}
#include <stdio.h>
int main(void){
return 0;
}
#include <stdio.h>
int main(void){
return 0;
}
#include <stdio.h>
int main(void){
return 0;
}
#include <stdio.h>
int main(void){
return 0;
}
Now I expected some kind of consistent behaviour at least apart from the last snippet as from the C programming point of view the source does exactly the same. But it seems this is not the case.
The first snippet is straightforward, even in the generated assembler the code dereferences a NULL pointer in puts and therefore results in a segfault. But wasn't it printf from the GNU libc that replaces such cases with (null)? Yes it was but it turns out only in some cases. Now here is the fun part. If we look at the generated code for the first example we see:
00000000004004ec <main>:
4004ec: 55 push %rbp
4004ed: 48 89 e5 mov %rsp,%rbp
4004f0: bf 00 00 00 00 mov $0x0,%edi
4004f5: e8 e6 fe ff ff callq 4003e0 <puts@plt>
Huh? puts? Now that is interesting. It seems like gcc sees the format string "%s\n" and after that a pointer (void *) so it assumes the usage of puts does make sense so it optimizes the call.
Now for the second code snippet this is not the case:
00000000004004ec <main>:
4004ec: 55 push %rbp
4004ed: 48 89 e5 mov %rsp,%rbp
4004f0: be 00 00 00 00 mov $0x0,%esi
4004f5: bf fc 05 40 00 mov $0x4005fc,%edi
4004fa: b8 00 00 00 00 mov $0x0,%eax
4004ff: e8 dc fe ff ff callq 4003e0 <printf@plt>
In this case gcc sees the format string and an integer so it can't just use this with puts in a way that makes sense. printf is used and the result is (null).
Until this point the behaviour is somehow predictable at least if you know that.
But it becomes even more strange. The third and the fourth example
both result in the usage of printf and therefore the displayed result is (null). In my opinion it seems that gcc is testing exactly for "%s\n" (as puts prints a newline at the end anyway). So these two examples don't segfault as well. If there's a newline gcc is - or at least that's my impression - already concatenating the format string with the pointer value.
In the last case the newline is present again. However there is a leading string in front of the format string %s. Here gcc is not seeing this as a whole thing concatenating it and calling puts. It's using printf again and results bla: (null).
I have no idea what the reason behind this behaviour is, I guess there are good arguments for that by the gcc people. But honestly, it SUCKS and is highly inconsistent, *grrr*. The whole behaviour isn't even consistent between different gcc versions.
The above results are tested with
gcc (Debian 4.3.4-2) 4.3.4. I also tested with
gcc (GCC) 3.4.3, in this case all of the above examples result in a segmentation fault (not sure when this (null) replacement feature in the glibc was introduced though). You can also disable this "optimization" by using
-fno-builtin-printf btw.