Resource Management in perspective: the Stack

An archetypical analogy of a stack data structure with a stack of plates.

Up to now, we have been focus­ing on where memory man­age­ment actu­ally occurs, the heap. But this is not the only memory seg­ment rel­ev­ant to a pro­gram. Indeed, in all garbage col­lec­ted tech­niques, there’re oth­er play­ers hav­ing a rel­ev­ant role in the game: the mutat­or and the root set. In the ref­er­ence coun­ted garbage col­lec­tion, it’s the mutat­or who dir­ects the life of its sub­jects, as it noti­fies in real time about the changes in ref­er­ences out­side of itself. In the tra­cing garbage col­lec­tion, the root set plays a fun­da­ment­al role, as all tra­cing starts in this set, but it’s again the mutat­or who dir­ects the life of the root set. The mutat­or is, essen­tially, com­posed of the Call Stack.

This call stack is a mas­ter­piece of com­puter engin­eer­ing all by itself. I can’t recall where was it intro­duced for the first time, but many prob­lems the call stack solves seem to be legit ori­gins. Indeed, it’s so uni­ver­sal these days that it’s hard to find a chip without sup­port for call stack struc­tures, and, at least to me, it’s hard to ima­gine a mod­el of com­pu­ta­tions without it — please tell me if you know of any, I really want to know!

The raison d’être of the call stack

Subroutines are an example of an idea that can be imple­men­ted with stacks. A call stack is a stack of call frames: each time a sub­routine is called, the return address of where this sub­routine should trans­fer con­trol back when it’s done is pushed onto the stack1. The call stack will record the address of the place in the .data seg­ment where it should tell the CPU to con­tin­ue execut­ing from once it’s done.

If a sub­routine needs para­met­ers to be passed, much in the style of a push­down auto­mata these para­met­ers can be pushed onto the stack. If the sub­routine needs loc­al vari­ables for its com­pu­ta­tions, it can keep its loc­al memory with­in its stack frame, on the call stack; once the sub­routine is done, it can simply drop the entire stack frame. Note how simple and quick it is to alloc­ate space with­in a stack frame: you only need a point­er to the begin­ning of the frame, and anoth­er point­er to the end, and expand­ing a frame is just mov­ing its top point­er fur­ther up, much in the way that brk works, but basic­ally instant at the lack of a sys­tem call — notice nev­er­the­less that this then depends on how much space is pre­vi­ously reserved for the stack: this is what is known as a Stack Overflow, when a new frame sur­passes the alloc­ated space and the pro­gram seg­faults, and the ker­nel kills you.

Parameter passing

Parameter passing is an inter­est­ing fea­ture. If you’ve done any Windows pro­gram­ming, you might have seen some­where keywords like __stdcall or __cdecl: this is a call­ing con­ven­tion. When a sub­routine calls anoth­er one, a bin­ary con­ven­tion needs to be estab­lished: where are para­met­ers passed, if by registers or by the stack, and in which order, how a return value is passed back to the caller, and who should clean the para­met­ers, if the caller or the callee. Let me share with you some anec­dot­al curi­os­it­ies. Let’s sup­pose a clas­sic x86 archi­tec­ture (32-bits), and the fol­low­ing equi­val­ent examples, with their com­piled machine code:

int __cdecl callee_cdecl(int a)
    return a + 1;

void caller()
    int a = callee_cdecl(1);
int __stdcall callee_stdcall(int a)
    return a + 1;

void caller_stdcall()
    int a = callee_stdcall(1);

cdecl_vs_stdcall-1024x412 Resource Management in perspective: the Stack

Notice the dif­fer­ences. It’s all on who is respons­ible to clean the giv­en para­met­er. In both cases, an int of value 1 is passed as a para­met­er when the caller says push 1. The caller then executes call, which saves the return address of the next instruc­tion in the caller on top of the stack and trans­fers con­trol to the callee. Now, the callee com­putes its value and saves it in the register eax, and it cleans up after him­self: here is where the dif­fer­ences starts. After pop­ping the stack frame’s base point­er (pop ebp), cde­cl’s callee then says ret 0, which means return exe­cu­tion to the address saved on top of the stack, while stdcall runs a ret 4, which means return exe­cu­tion to the address 4 bytes behind the top of the stack, updat­ing at once the stack frame’s top point­er. As cde­cl didn’t pop the para­met­er (the earli­er push 1), cde­cl then executes add esp,4, which basic­ally means move the stack frame’s top point­er 4 bytes up — exactly what ret 4 did in stdcall.

Now ima­gine the mess if we mix cde­cl callers with stdcall callees or vice versa. A cde­cl caller pops what the stdcall callee already popped, that is, we would have the stack point­er updated 8 bytes instead of 4, hence, con­sid­er­ing that instruc­tions are often executed by off­set­ting esp, we would have everything off­set: stack cor­rup­tion. Similar with an stdcall caller with a cde­cl callee: none of them pops the pushed para­met­er, so a ret call from the caller would actu­ally trans­fer con­trol to that int nev­er popped: hope­fully a seg­fault.

Stack allocations and deallocations

But let’s come back a bit, let’s pay atten­tion again to how memory man­age­ment on the stack works…

2q80v5 Resource Management in perspective: the Stack

Deallocating a stack frame is noth­ing else but updat­ing the stack frame and the stack point­er back to the pre­vi­ous stack frame base and top. The cells of the cur­rent stack frame are noth­ing but ignored and can be freely over­writ­ten on the next stack frame push or growth of the cur­rent one — in the very same way, they can be re-read by unini­tial­ized vari­ables of a sub­sequent stack frame alloc­ated in the same space. This is an imple­ment­a­tion detail.

But what does this means for the semantics of the lan­guage? It means that the vari­ables of your lan­guage can be divided in two kinds: those with value semantics, and those with point­er semantics. The former are those cells semantic­ally equi­val­ent to a value: int i = 3 means that i is a three, and not that i points to a dis­tant three. The lat­ter, are those cells semantic­ally equi­val­ent to an address: int* ptr_i = malloc(sizeof(int)) means that ptr_i is an iden­ti­fi­er to anoth­er cell, and not a value by itself.

A stack frame pop­ping will simply treat everything with­in itself as hav­ing value semantics. It will con­sider that ptr_i is a value, and make the stack for­get about some value val­ued (pun inten­ded) 0x7ffc78e71b60. This is, a stack frame pop­ping effect­ively cleans the int, and the point­er. But not the value the point­er points to.

At last, let me leave you with a recom­men­ded read, if you want to know more in depth the work­ings of the call stack:

  1. Before try­ing to pic­ture this in your mind, it is import­ant to remem­ber one thing: the stack does not con­tain execut­able code! For secur­ity reas­ons, your ker­nel can keep track of what parts of your code are execut­able, or read-write, or read-only, and the stack is just read-write, all execut­able code is read from the .data memory seg­ment alone.

Leave a Reply