How do you deal with memory in Assembly? Well, you basically tell the processor to access this field in memory, and don’t tell the processor to access this other one.
As I mentioned last time, from a traditional perspective memory can be managed manually or automatically. The assembly idea is where C builds onto, with added abstractions. A compiled C program is divided in regions: a static region of memory is provided in the binary at compile-time, where objects have a lifetime as long as its compiled binary is loaded in memory. For example, globals in a compilation unit (a .c file). Then there’s the automatic region, the glorious Stack. Here memory is allocated on the creation of a new stack frame, and automatically deallocated on the exit of such stack frame — this is why pointers to locals are so dangerous, the stack has a very unpredictable lifetime. Then, there’s the dynamic region, the famous Heap, where memory of arbitrary size can be requested at run-time. This is what is manually requested and freed, and where in reality most of the magic of resource management happens.
It is important to understand the heap in order to understand what malloc, free and garbage collectors, do.
What is that heap then?
Memory, huge pools of memory, need to be bookkeeped. Any system needs to keep track of what is used, in which size, by whom, with what permissions, etc. Traditionally, this is done by bookkeeping only what is needed, and considering the rest as all zeroed, all empty and ignored. Your allocator — the entity responsible for malloc-ing and free-ing things, essentially, for bookkeeping memory — keeps track of the address to the beginning of the heap (just after the data segment), and another address for the end of the section of memory he’s bookkeeping at the moment: the program break address 1. Anything else above that address is not bookkeeped (and that’s good, less job to do!), he — the allocator — has never given you any of it, so you’re not supposed to touch it anyway. And the kernel knows it. So if you ever dereference any pointer above the program counter, you’re making a mistake by definition, and the kernel is aware of it, so it will kill you — a hard segmentation fault.
How the kernel is aware of your program counter is a wonderful masterpiece. Basically, all programs run in virtual memory: any time you see in your assembly code any instruction for addressing an exact place in memory, that’s only an illusion; in reality, the kernel keeps track of the real address and translates in real time. Whatever your 0x00ff2374b2 (random number for the sake of an example) pointer means to you, a chip with support for virtual memory will notify the kernel that a memory access has been done, and the kernel will secretly give you something else instead, whatever, you’re not supposed to know it — and if you do, the kernel is broken, for security reasons. This is an extremely critical part of the kernel: if addresses must be translated on-the-fly, the kernel needs to keep track of what memory maps to what, and the data structures where he does also live in memory, which doubles the indirections and halves the performance, so the kernel keeps caches, and Translation Buffer Lookasides, and extra data in the Memory Management Unit of your chip, and a ton of other amazingly complicated techniques to keep this fast — a topic for another day.
So the kernel knows everything you have in memory. And your allocator knows he’s being watched. This way, everything “above” the program break doesn’t really exists yet, the kernel doesn’t have to prepare that memory for you, because you’re not supposed to use it anyway — and again, if you do, the kernel kills you. This is how processes can live under the illusion of infinite memory, and how the kernel can keep everybody coexisting in peace.
But how does the allocator cooperates with the kernel?
The allocator will basically request on demand big chunks of memory to the kernel, and then add this new chunk to the bookkeeping. Every time you request memory, the allocator will serve you from the chunks he has prepared for you, and when the allocator runs out of space in his chunks, he asks the kernel for more, and then continues bookkeeping it and serving you.
The optimizations are again amazing. System calls are expensive2, so we don’t want to do system calls every second instruction. This is why the allocator preallocates — requests — chunks in advance. Now, the allocator has his chunk — this is The Heap — and needs to bookkeep it. The strategies are brilliant pieces of engineering, just as the strategies the kernel uses to virtualise everybody are; and to an extent, the allocator often just mirrors the doings of the kernel; all of this, is implementation defined. Let’s build a picture of them.
We can keep a doubly-linked list of free memory blocks, this is the most common strategy. Blocks of what size? Another common strategy is keeping two linked lists, each of different memory block sizes, optimised for the most common request sizes. Or three or more linked lists for more and more different size requests… How to decide? Heuristically! Oh, and what about freeing? Should free take the pointer being freed only? What about its size? Can we keep a table of pointer and sizes? Or keep the size next to the allocated block3. Also, when to actually return memory to the kernel? After all, free only means back to the allocator, and it’s the allocator’s job to decide when to return it: will he do so too late, or too early? What if the block just below the program break is busy but everything else is free? Are we then wasting memory we can’t return to the kernel? So we add mmap4 to the kernel interface so we can request out-of-order chunks. And what about multithreading, if two requests to the same linked list arrive at once? Aha! Now you see the complexity of an allocator!
Allocators are very general-purpose things that want to make absolutely all patterns of allocations happy, and for this reason, they don’t really excel at anything, but rather, their job is just not to suck at anything, and that’s a lot to expect. For this reason, if you have a memory aggressive program and your allocator becomes a bottleneck, supplementing a new allocator optimised for your patterns of allocations — preallocate from the kernel a chunk as large as its optimal and bookkeep it the best way you need — can raise performance several orders of magnitude, and this is something for which C++ gives you a lot of opportunities through custom allocators, policies, and templates… excuse me for digressing again.
Next time, I’ll show you the many strategies for freeing what is not needed, exactly when is not needed, and only once.
- Reality is much more complicated than this, but bear with me, this is a very good picture to visualise what’s going on.
- The chip needs to isolate any possible wrong-doing of your program from the kernel and the rest of the system, so a system call, that is, transferring control to the kernel, needs to save the state of your program, prepare the new memory tables, change a lot of flags in the chip, then the kernel does his thing, then all the protections and states are reverted, and your process can continue…
- It is usually kept a word preceding the returned pointer for example, just try to read the contents of a malloc-ed pointer minus a word
- In the Unix world, brk and sbrk are used for moving the program break, and mmap for out-of-order memory mapping. In Windows you’ll need to check the documentation, but the spirit is pretty much the same.