Wednesday, June 1, 2011

Don't initialize that big static data! (c++)

I learned this recently, and so I assume not a lot of people realize this.
But lets take a look at some sample code:

static const unsigned int _1mb = 1024*1024*1;
char a[_1mb*10];
int main() {}


static const unsigned int _1mb = 1024*1024*1;
char a[_1mb*10] = {7};
int main() {}


Now whats the difference between both of those programs? Well the only difference is that the bottom one is initializing a[0] to 7. Both of the programs initialize the rest of the a[n] data to 0.
In the first one, all of 'a' gets initialized to 0 because all static/global data that is not explicitly initialized gets initialized to 0 (as per some rule in C).
In the second one, the first element of 'a' gets initialized to 7 because we explicitly told it that in the array initializer and the rest of the data is filled with 0's because when you have an array initializer that does not cover all of the elements, the rest is treated as 0's.

Now why do we care about this?
Well why don't you try compiling both of those programs and take a look at the executable file?
You might be surprised to find out that the first executable would be just a few kb, while the second would be over 10mb!

Why is this?
Well it turns out that compilers/linkers end up putting uninitialized static data in its own section called the bss section. This section consists of data that is initialized to all 0's. This means that the executable doesn't need to store all that data in itself, it just needs information of how big the variable is, and where to place all the 0's in memory; it then does this when the application starts up.

In contrast, the second example now explicitly initializes the data. Now the compiler/linker places that data in the data section, and it actually stores the full contents of the data in the executable. This means when you have an explicitly initialized 10mb chunk of data, your executable file will grow by 10mb!

The exception to the above rule is data that is initialized to 0. Like if we had "char a[_1mb*10] = {0};", now what will happen?
Well it turns out it can either be put in the bss section or the data section. Apparently some compiler/linkers might be stupid enough to put it into the data section meaning the executable will still grow by 10mb. Thankfully msvc isn't that stupid and it doesn't do that (the built executable is the same as if you didn't explicitly initialize 'a').


Anyways, the moral of the story is to keep the above information in mind when you are explicitly initializing data.
What would be a way to explicitly initialize a[0] to '7' at program startup w/o bloating the executable an extra 10mb?
Well this is one way:

static const unsigned int _1mb = 1024*1024*1;
char a[_1mb*10];
struct _init_a {
_init_a(char* c) { c[0] = 7; }
} __init_a(a);
int main() {}


Now __init_a's constructor will end up initializing a[0] to 7 at the application startup like we wanted. (Note: remember to beware of the static initializion order fiasco problem in c++ if you will have other static variable's constructors using 'a').