Resource Management in perspective: the Heap

The Heap seen as a card catalogue

How do you deal with memory in Assembly? Well, you basic­ally tell the pro­cessor to access this field in memory, and don’t tell the pro­cessor to access this oth­er one.

As I men­tioned last time, from a tra­di­tion­al per­spect­ive memory can be man­aged manu­ally or auto­mat­ic­ally. The assembly idea is where C builds onto, with added abstrac­tions. A com­piled C pro­gram is divided in regions: a stat­ic region of memory is provided in the bin­ary at com­pile-time, where objects have a life­time as long as its com­piled bin­ary is loaded in memory. For example, glob­als in a com­pil­a­tion unit (a .c file). Then there’s the auto­mat­ic region, the glor­i­ous Stack. Here memory is alloc­ated on the cre­ation of a new stack frame, and auto­mat­ic­ally deal­loc­ated on the exit of such stack frame — this is why point­ers to loc­als are so dan­ger­ous, the stack has a very unpre­dict­able life­time. Then, there’s the dynam­ic region, the fam­ous Heap, where memory of arbit­rary size can be reques­ted at run-time. This is what is manu­ally reques­ted and freed, and where in real­ity most of the magic of resource man­age­ment hap­pens.

It is import­ant to under­stand the heap in order to under­stand what mal­loc, free and garbage col­lect­ors, do.

What is that heap then?

C-memlayout Resource Management in perspective: the Heap

Memory, huge pools of memory, need to be book­keeped. Any sys­tem needs to keep track of what is used, in which size, by whom, with what per­mis­sions, etc. Traditionally, this is done by book­keep­ing only what is needed, and con­sid­er­ing the rest as all zer­oed, all empty and ignored. Your alloc­at­or — the entity respons­ible for mal­loc-ing and free-ing things, essen­tially, for book­keep­ing memory — keeps track of the address to the begin­ning of the heap (just after the data seg­ment), and anoth­er address for the end of the sec­tion of memory he’s book­keep­ing at the moment: the pro­gram break address 1. Anything else above that address is not book­keeped (and that’s good, less job to do!), he — the alloc­at­or — has nev­er giv­en you any of it, so you’re not sup­posed to touch it any­way. And the ker­nel knows it. So if you ever derefer­ence any point­er above the pro­gram counter, you’re mak­ing a mis­take by defin­i­tion, and the ker­nel is aware of it, so it will kill you — a hard seg­ment­a­tion fault.

How the ker­nel is aware of your pro­gram counter is a won­der­ful mas­ter­piece. Basically, all pro­grams run in vir­tu­al memory: any time you see in your assembly code any instruc­tion for address­ing an exact place in memory, that’s only an illu­sion; in real­ity, the ker­nel keeps track of the real address and trans­lates in real time. Whatever your 0x00ff2374b2 (ran­dom num­ber for the sake of an example) point­er means to you, a chip with sup­port for vir­tu­al memory will noti­fy the ker­nel that a memory access has been done, and the ker­nel will secretly give you some­thing else instead, whatever, you’re not sup­posed to know it — and if you do, the ker­nel is broken, for secur­ity reas­ons. This is an extremely crit­ic­al part of the ker­nel: if addresses must be trans­lated on-the-fly, the ker­nel needs to keep track of what memory maps to what, and the data struc­tures where he does also live in memory, which doubles the indir­ec­tions and halves the per­form­ance, so the ker­nel keeps caches, and Translation Buffer Lookasides, and extra data in the Memory Management Unit of your chip, and a ton of oth­er amaz­ingly com­plic­ated tech­niques to keep this fast — a top­ic for anoth­er day.

So the ker­nel knows everything you have in memory. And your alloc­at­or knows he’s being watched. This way, everything “above” the pro­gram break doesn’t really exists yet, the ker­nel doesn’t have to pre­pare that memory for you, because you’re not sup­posed to use it any­way — and again, if you do, the ker­nel kills you. This is how pro­cesses can live under the illu­sion of infin­ite memory, and how the ker­nel can keep every­body coex­ist­ing in peace.

But how does the allocator cooperates with the kernel?

The alloc­at­or will basic­ally request on demand big chunks of memory to the ker­nel, and then add this new chunk to the book­keep­ing. Every time you request memory, the alloc­at­or will serve you from the chunks he has pre­pared for you, and when the alloc­at­or runs out of space in his chunks, he asks the ker­nel for more, and then con­tin­ues book­keep­ing it and serving you.

card-catalog-as-a-heap-1024x677 Resource Management in perspective: the Heap
The Heap seen as a Card Catalogue: we keep track of what’s used by what info, and if we need more space, we con­tin­ue request­ing ward­robes of cata­logues to our boss. If we use what he hasn’t giv­en us, we lose our job: he fires us.

The optim­iz­a­tions are again amaz­ing. System calls are expens­ive2, so we don’t want to do sys­tem calls every second instruc­tion. This is why the alloc­at­or preal­loc­ates — requests — chunks in advance. Now, the alloc­at­or has his chunk — this is The Heap — and needs to book­keep it. The strategies are bril­liant pieces of engin­eer­ing, just as the strategies the ker­nel uses to vir­tu­al­ise every­body are; and to an extent, the alloc­at­or often just mir­rors the doings of the ker­nel; all of this, is imple­ment­a­tion defined. Let’s build a pic­ture of them.

We can keep a doubly-linked list of free memory blocks, this is the most com­mon strategy. Blocks of what size? Another com­mon strategy is keep­ing two linked lists, each of dif­fer­ent memory block sizes, optim­ised for the most com­mon request sizes. Or three or more linked lists for more and more dif­fer­ent size requests… How to decide? Heuristically! Oh, and what about free­ing? Should free take the point­er being freed only? What about its size? Can we keep a table of point­er and sizes? Or keep the size next to the alloc­ated block3. Also, when to actu­ally return memory to the ker­nel? After all, free only means back to the alloc­at­or, and it’s the allocator’s job to decide when to return it: will he do so too late, or too early? What if the block just below the pro­gram break is busy but everything else is free? Are we then wast­ing memory we can’t return to the ker­nel? So we add mmap4 to the ker­nel inter­face so we can request out-of-order chunks. And what about mul­ti­th­read­ing, if two requests to the same linked list arrive at once? Aha! Now you see the com­plex­ity of an alloc­at­or!

Allocators are very gen­er­al-pur­pose things that want to make abso­lutely all pat­terns of alloc­a­tions happy, and for this reas­on, they don’t really excel at any­thing, but rather, their job is just not to suck at any­thing, and that’s a lot to expect. For this reas­on, if you have a memory aggress­ive pro­gram and your alloc­at­or becomes a bot­tle­neck, sup­ple­ment­ing a new alloc­at­or optim­ised for your pat­terns of alloc­a­tions — preal­loc­ate from the ker­nel a chunk as large as its optim­al and book­keep it the best way you need — can raise per­form­ance sev­er­al orders of mag­nitude, and this is some­thing for which C++ gives you a lot of oppor­tun­it­ies through cus­tom alloc­at­ors, policies, and tem­plates… excuse me for digress­ing again.

Next time, I’ll show you the many strategies for free­ing what is not needed, exactly when is not needed, and only once.

  1. Reality is much more com­plic­ated than this, but bear with me, this is a very good pic­ture to visu­al­ise what’s going on.
  2. The chip needs to isol­ate any pos­sible wrong-doing of your pro­gram from the ker­nel and the rest of the sys­tem, so a sys­tem call, that is, trans­fer­ring con­trol to the ker­nel, needs to save the state of your pro­gram, pre­pare the new memory tables, change a lot of flags in the chip, then the ker­nel does his thing, then all the pro­tec­tions and states are rever­ted, and your pro­cess can con­tin­ue…
  3. It is usu­ally kept a word pre­ced­ing the returned point­er for example, just try to read the con­tents of a mal­loc-ed point­er minus a word
  4. In the Unix world, brk and sbrk are used for mov­ing the pro­gram break, and mmap for out-of-order memory map­ping. In Windows you’ll need to check the doc­u­ment­a­tion, but the spir­it is pretty much the same.

Leave a Reply