«C programmers think memory management is too important to be left to the computer. Lisp programmers think memory management is too important to be left to the user.»Ellis and Stroustrup, The Annotated C++ Reference Manual.
I’m a C++ developer, that’s were I’m getting paid from. I started with C++ early. My first step in programming was that fantastic Harvard’s CS50 online course, initially on C, and then I moved to two more sources: Dan Grossman’s Programming Languages materials and CS106B Programming Abstraction materials from Stanford. This last course was made on C++, and it clicked on me for no particular reason; perhaps just the stubbornness of doing things the hard way, the arrogance of thinking that Java and C# are unnecessary simplifications, who knows. Anyway it clicked on me, and I stayed with it.
I have worked with old C codebases 1 and I have worked with interoperability between runtimes. I’ve seen C++ done the C way, and .NET interop with native libraries the… no, no reasonable way. And there is one pattern I’m seeing repeating itself all over, at least from my whatever little experience in the field: Resource Management is most often a mystery.
The part that pains me the most is to see C++ done the C way, it makes me thing we haven’t got the zero-cost abstractions that C++ offered over C, we’ve got away thinking that OOP was the C++ deal. The part that worries me, is to see interop garbage collected code not being understood, and the Garbage Collector left as a happy enigma.
So, what is resource management, and how it is done?
Resource Management is the problem of, well, managing resources. Computers have a limited set of resources, they’re made to believe resources are virtually infinite but in reality, there are walls that could be hit. Memory is the most obvious one, and without loss of generality we can reduce all our problem to it. There’s the problem of file handlers – Operating Systems have a limited amount of files they can open at once – or the sockets in a connection, you name it. Resources are acquired, and for the health of the system, need not be misused. Give it back as soon as you don’t need it any more. Best case only you would suffer, worst case you can take an entire system down – hopefully, a proper Operating System will simply kill your abusive program in order to protect the rest of the system.
But returning resources is not that simple. It’s clear that if you don’t return things you don’t need any longer you have a resource leak, but if you return something before you actually don’t need it any more, you have a runtime error, as your program expected a resource to be available and doesn’t understand why is not there. And there’s the chance of returning a resource twice: the second time you request a resource to be returned, the system will be a bit lost: what are you trying to return me? Hmm… you must be out of your mind, I’ll better kill you now before you get more messy – again, that’s what a good OS will hopefully do. You have a worse problem if it doesn’t.
Time to get technical.
Without loss of generality, I’ll reduce the case to memory management. Once memory is acquired, there are three dangers that arise:
- forget-to-free: This is were everything starts. Don’t return memory, and your system will run out of it. Performance can also be severely hurt: not just because at some point swapping memory to the infinitely slower disk will start to happen, but even just because your allocators will suffer the maintenance of an unnecessarily large heap.
- use-after-free: free a resource before time, that is, free it and then try to use it again. What happens here depends on your system, and your underlying runtime: you might have programmed your system to throw an exception and recover what was lost, if recovery is even possible, with the huge penalty in performance that comes with it; or in an unprotected world like that of C and C++, dereference of an invalid pointer might make the kernel segfault your program if your allocator actually returned that memory to the OS, as in such case, it is not yours any more, touch it and the kernel will kill you. Or if it is still yours, it might have been garbaged, or given to a different piece of your system, which will lead you to memory corruptions and wishes for a crash as quick as possible, so you realise your system went nuts before it sets the world on fire.
- free-after-free: return a resource, and then return it again. This depends again on your system and your underlying runtime. A paranoid checker could make sure this memory belongs to you, but such checks can be extremely expensive to compute. A
naiveperformance-oriented runtime would just add whatever you’ve told him to the free list of memory. But, if you already freed this memory before, someone else might have obtained it on a different acquisition request in the meantime: when you free it a second time, you basically corrupted the memory of someone else. Another crash coming.
Nothing here should be new to any intermediate programmer – you seriously have a problem if this is new to you in a production environment!
Note the repeating pattern: in use-after-free just before you reuse it, someone else might have acquired it, and in free-after-free you freed something someone else could have acquired in the meantime. There’s an idea of multiple players and ownership of resources: he who expects to own a resource shall not be surprised by the wrong-doings of those who were not supposed to know about it!
Don’t be so naive to thing that this wouldn’t happen in a single-threaded environment: your runtime often runs in separate threads you didn’t program, your OS libraries often also do stuff in the background, your OS might as well play with your allocators in the meantime. Even on a truly single-threaded program, there are still often multiple objects sharing resources and allocators. There’s no such thing as a single-threaded world. Even if you’re doing embedded on tiny memories without any kernel administration, this is the interface with your allocators, you don’t know what are they doing with your memory unless you implement them yourself. And, have you implemented your own malloc and free
Ok, so we’re facing a complicated problem that needs a solution. What are the options? Historically, it’s all described in the quote at the beginning of the post. The C way, that is, the manual way, or the Lisp way, that is, the automatic – garbage collected – way. There’s a third one, pioneered by C++ but only really exploited recently by the most modern C++ standards and by Rust. Ownership Semantics.