Thursday, January 6, 2011

Using and understanding the static keyword C++, as well as some interesting tricks

There are just so many things I can talk about here, so its difficult to condense this into a small blog post. I would like to rant about how I hate c++'s header/.cpp translation unit model mess, but I won't in order to keep this shorter.

The first thing to mention is that when you compile a c++ project with multiple .cpp files, what essentially happens is all the .h files included in the .cpp files are expanded and merged with the .cpp file to form a translation unit which the compiler uses to generate an object file. The different translation units don't know about the other .cpp files, so that's why you need function and class prototypes and extern-variables (such as 'extern int;') which tell the compiler/linker that these functions/classes/variables exist in some .cpp file. (This crappy design is the worst thing about c++, and it was inherited from the c programming language, most-likely so that the compiler can save memory when compiling your source code. Newer more advanced languages like c# don't need header files because their compilers are smart enough to detect the different functions/classes/variables in separate files without prototypes).

So because of the above design, we need to use crappy header files in our large c++ projects. There are some tricks I've learned over the years to eliminate the need for many duplicate prototype/declaration/definition mess, although it is not commonly used by other c++ coders and they probably won't like the tricks because its not the common way to program. That said I will probably talk more in detail about this in another article.

Now why is the static keyword useful? Well c++ being the evil bastard that it is, has different meanings for the static keyword depending on how its used. So just talking about 'static' in general is not specific enough.

One use of static is in the file-scope. This tells the compiler that the variable/function is unique to the current translation unit (.cpp file) not visible to other .cpp files.

So if you have:

// source1.cpp
static int x = 10;
int main() { cout << x << endl; return 0; }
// source2.cpp
extern int x;
int foo() { x++; }


The above code will give you a linker error after compiling:
source2.obj : error LNK2001: unresolved external symbol "int x" (?x@@3HA)

If you remove the static keyword, then it will compile. The reason is as mentioned before, static will make the variable unique to the translation unit essentially hiding it from other units.

The properties of a static variables declared at global file scope (not in a function) are that they're initiated at the start of the program, and then they're deconstructed at the end of the program.
So if you were to have a line of code that said "static someClass myClass(10);" and it was in global file scope, then the constructor myClass will be called sometime before the main() function in your program, and its destructor gets called when exiting your program.

One misleading thing c++ does is it changes this behavior when you have nested static variables in function scope.

int foo() {
static someClass myClass(10);
return myClass.someFunct();
}


Now you might expect myClass's constructor to be called at the beginning of the program, but this is not the case. Instead the constructor is called the first time the foo() function is ever called! This is a tricky part of the standard that may cause confusion at first.

The interesting thing about this behavior is that although its tricky and seemingly evil, it also is useful. Specifically it allows us to know the time of initialization of the static nested variables. When static variables are initialized in global file scope, it is unknown which order they get initialized in (relative to other static variables, and this is one reason people think they're evil; because they can be initialized before data they depend on is initialized). Using nested static variables eliminates this problem because we know that the first time we're calling function foo(), the constructor of the static variable is called.

Now to take advantage of this we can declare static variables like this:

// Instead of doing this:
// static someClass myClass(10);

// We do this:
someClass& myClass() {
static someClass _myClass(10);
return _myClass;
}


Now we can use 'myClass()' and we know that the first time its called will be when the constructor is called.


Now moving on, we can have static functions as well.
Static functions behave like static variables at file scope because it creates a unique function for the .cpp file where its defined, and its hidden from the other .cpp files.
You might think it doesn't matter, and it doesn't for the most part, but imagine the following code:

// header.h
static int& getInt() {
static int i = 77;
return i;
}

// source1.cpp
#include "header.h"
void foo();

int main() {
foo();
cout << getInt() << endl;
return 0;
}

// source2.cpp
#include "header.h"
void foo() {
getInt() = 10;
}


What do you think the output of the program will be?
You might think the output should be 10, but its not, its 77.

The reason for this is that we declared the function getInt() as static, that means for both source1.cpp and source2.cpp, it created separate local versions of the getInt() function. So when getInt() is called in source2.cpp's foo(), it is a separate function than getInt() the one source1.cpp sees, so when source1.cpp prints out getInt() it prints its own version which was initialized to 77.

What happens if we remove the 'static' keyword from 'static int& getInt()', well we get linker errors:
1>source1.obj : error LNK2005: "int & __cdecl getInt(void)" (?getInt@@YAAAHXZ) already defined in source2.obj

This is because the function is trying to be defined in both .cpp files so the linker doesn't know which version to use.
The usual way to prevent these problems is to define the function in one .cpp file, and then have the function prototype in the header like so:

// header.h
int& getInt();

// source1.cpp
#include "header.h"
void foo();

int& getInt() {
static int i = 77;
return i;
}

int main() {
foo();
cout << getInt() << endl;
return 0;
}

// source2.cpp
#include "header.h"
void foo() {
getInt() = 10;
}


The program now prints what you expect it to print, "10".


Now that we got that out of the way, we can talk about the static keyword in classes and structs.

When you have a struct/class with a static member variable, you're required to declare it in the struct, but then define it outside the struct.
So you do:

// header.h
struct someStruct {
static int i; // declare
// static int i = 10; // c++ doesn't allow this
};

// source1.cpp
#include "header.h"

// This line must go in only 1 translation unit
// so it should only be put in one .cpp file
int someStruct::i = 10; // define i


There will now be only 1 instance of someStruct::i, and you can use it throughout your code such as 'someStruct::i = 49'.

If you noticed though c++ makes you split the definition of the variable from the declaration; and the definition is put into only 1 .cpp file.
Well there is workaround for this which lets you define it in the same place you declare it.

Before I mention the solution, I'll mention another problem with c++.
What if you wanted to do this:

struct someStruct {
static const int i = 24; // This compiles
static const char* message = "Hello world"; // This doesn't
void foo() { cout << message << " " << i << endl; }
}


The only thing c++ allows us to initialize within a class are "static const int" variables, anything else it requires you to split the initialization part into a .cpp file.

Now whats the solution?
static member functions with static local variables!

Here's cool way to fix the problem:

struct someStruct {
static const int i = 24;
static const char* message() {
static const char* _message = "Hello world";
return _message;
}
void foo() { cout << message() << " " << i << endl; }
}


The evil part about c++ is that static member functions with static local variables behave differently than static file-scope functions with static local variables.

This example will illustrate that:

// header.h
static int& getInt() {
static int i = 77;
return i;
}

struct someStruct {
static int& getInt() {
static int i = 77;
return i;
}
};

// source1.cpp
#include "header.h"
void foo();

int main() {
foo();
cout << getInt() << endl;
cout << someStruct::getInt() << endl;
return 0;
}

// source2.cpp
#include "header.h"
void foo() {
getInt() = 10;
someStruct::getInt() = 10;
}


What do you think the output of the program will be?
Well the first cout with the file-scope function we know will print '77', because this is the same example i gave earlier in this article. But strangely the second line with 'someStruct::getInt()' returns 10!

The reason for this is that static local variables in class member functions will only have 1 instance of themselves, even throughout different translation units!

This is a powerful thing to know and can be used for random tricks.

This last example i'm going to give will show you a trick so that you can declare and define static variables in a header file that are used in different .cpp files, but the variable will be the same throughout all the different translation units (so it doesn't make unique copies like normal static at file-scope does).

// header.h
struct someStruct {
static int& getInt() {
static int i = 77;
return i;
}
};

static int& i = someStruct::getInt();

// source1.cpp
#include "header.h"
void foo();

int main() {
foo();
cout << i << endl;
return 0;
}

// source2.cpp
#include "header.h"
void foo() {
i = 10;
}



The above code prints out 10.
Notice how we didn't have to declare any variable in the individual .cpp files and could declare and initialize the variables all in the header file.
This trick can be useful, so use it wisely.

Hope this article was useful for at least someone :p

6 comments:

  1. Thanks it was very helpful.
    What makes C++ very cool is the amount of tips and tricks that can be found from only careful reading of the ISO standard.

    ReplyDelete
  2. I do like that there are a lot of cool tricks you can do with c++, its part of why its so fun.
    The language though is pretty evil because many things do not behave as expected or have undefined results.
    C++ stupidly leaves some things as "up to the compiler" in order to promote faster code in different architectures, but effectively its means that your code can do different things depending on the compiler+architecture you compile it for. Who cares about speed if your code gets broken as a result!?
    Because C++ stupidly favors small speedups instead of consistency in many cases, it is one of the reasons the language is so dangerous.
    These things are stuff that the newer c++0x standards should try to fix, but the idiots designing the language are just bloating it with more crap, instead of fixing the bad things c++ currently does.
    Some of the new c++0x features are fun and useful, but on a whole it looks like it will just complicate the language a lot more, and not help with the REAL issues c++ has.

    Also I haven't carefully read the c++ standard, most of the stuff I learn is from talking to other c++ programmers, experimenting with code, and reading msdn or other online articles.
    I also think its more important to understand what the popular compilers do (msvc and gcc), instead of understanding the standard itself.
    The reason is that "who cares what the standard says if the compilers you're using do something different."
    Some people instead choose to only go by what the standard says; but no compiler out there implements the full c++ standard 100% flawlessly, so if you code purely by what the standard says, you will run into code that doesn't compile or work as expected. Also as mentioned the standard leaves many things up to the compiler, so in that case the standard doesn't tell you what it will do.
    Anyways its important to understand both the standard and what compilers end up doing I guess is what I'm trying to say.

    And glad you thought the article was helpful.

    ReplyDelete
  3. http://stackoverflow.com/questions/943280/difference-between-static-in-c-and-static-in-c/943389#943389
    This is what I mentioned before.

    "These things are stuff that the newer c++0x standards should try to fix, but the idiots designing the language are just bloating it with more crap, instead of fixing the bad things c++ currently does."
    Perfectly true, but the reason "proper" changes aren't seen is that they either conflict with C standards/older C++ codebases, or they are simply proposals wading through bureaucratic sludge. My favorite example: Bjarne suggested standardizing nested templates something like 11 years ago, but it is only going to appear in c++0x (ie. std::map> instead of std::map >).

    "Some of the new c++0x features are fun and useful, but on a whole it looks like it will just complicate the language a lot more, and not help with the REAL issues c++ has."
    Again perfectly true (except for small things, like nullptr, for instance...). Anyways, my view is that most coders/codebases already make do with the c++ compilers which are available, so why not just enjoy the new features? That, or become a large company with influence on the c++0x committee :P

    ReplyDelete
  4. haha, your blog has some parsing errors...
    I meant: http://pastie.org/1455161

    ReplyDelete
  5. shuffle2: thanks for the link, i didn't know about the anonymous namespace trick. i will have to fiddle around with namespaces sometime and see if i can find any other cool uses for them (i currently rarely ever use them).

    Also although the guy says that use of static is deprecated, its not going to go away anytime soon. Its too over-used and the C++ committee doesn't like breaking compatibility with C code.

    "Perfectly true, but the reason "proper" changes aren't seen is that they either conflict with C standards/older C++ codebases, or they are simply proposals wading through bureaucratic sludge."

    There was a guy on efnet's #c++ that was apparently a member of the c++0x committee.
    I bitched at him that the committee are just introducing a bunch of random features instead of fixing real problems.
    He said that most of the committee really hates to introduce new keywords because it could break backwards compatibility, so that's why they keep reusing existing keywords and symbols.

    I think the c++ language is going to be doomed in 10 years due to the crappy committee they have making bad decisions.

    "Anyways, my view is that most coders/codebases already make do with the c++ compilers which are available, so why not just enjoy the new features? That, or become a large company with influence on the c++0x committee :P"

    Yeh i am going to enjoy the new features. its just a shame that the real problems don't get solved.
    It really sucks when you have code like:
    cout << foo1() << foo2(); // The order foo1 and foo2 are called in is undefined

    Yet the c++ committee does nothing to solve these problems (speed is more important than well defined working code right!? :/)

    Also:
    My blog does have problems with '<' and '>' because it thinks you're giving it an html tag. You instead have to use 'Xlt;' and 'Xgt;' (replace 'X' with '&')
    Really sucks when i post template code here because i have to make all the modifications so it shows up correctly xD

    ReplyDelete
  6. effective program that tell why we use Static Member function in cplusplus in this link
    http://geeksprogrammings.blogspot.com/2013/09/static-member-functions.html

    ReplyDelete