Garbage Collectors in the real world

Well automatically cleaned factory

We saw the essen­tially two only ways to do Garbage Collection, and we saw their naive dis­ad­vant­ages. I say naive, because we saw the very simplist­ic imple­ment­a­tions. Like any­thing else in com­puter sci­ences, Garbage Collectors are an extremely optimised-(able) tool.

Lets give it a quick intu­it­ive thought. Tracing under­goes long pause times, mostly because it explores the entire heap at once, freez­ing the pro­gram to avoid data races between the GC and the pro­gram itself. A com­mon tech­nique for optim­ising mul­ti­th­readed pro­grams is to intro­duce more fine-grained lock­ing mech­an­isms; in this case, what if we incre­ment­ally trace smal­ler sec­tions of the memory graph instead of the entire graph at once?

Reference count­ing pays a price at the write bar­ri­er, which is spe­cially undesir­able when muta­tion rates are extremely high. Where is this rate extremely high? At the Call Stack. So it’s not sur­pris­ing to try to defer updates to the ref_count com­ing from stack frames. C++ and Rust go for a more soph­ist­ic­ated approach when pos­sible: to dis­tin­guish between bor­row­ing and shar­ing1; oth­er runtimes just skip the write bar­ri­er at a stack frame.

If tra­cing will trace incre­ment­al seg­ments of the heap, the prob­lem then turns into keep­ing all the seg­ments in sync: how to remove ele­ments of one seg­ment if we don’t know if oth­er seg­ments are point­ing to it? Then tra­cing intro­duces ref­er­ence count­ing between seg­ments, to keep track of how many cross-seg­ment ref­er­ences are liv­ing at a giv­en moment: a seg­ment can then be cleaned if the ref_count for this seg­ment is zero.

If write bar­ri­ers are skipped at the call stack, a resource can­not be release when the ref_count drops to zero, as there might be stack frames still point­ing to it: rather, they’re added to a Zero Count Table and every once in a while — just like in tra­cing — the runtime cal­cu­lates the real val­ues of the ele­ments of this table. References from the stack to the heap are traced, while ref­er­ences with­in the heap are coun­ted.

Towards a real-life GC: A Unified Theory

In what is a truly remark­able event, turns out that «all high-per­form­ance rel­ev­ant col­lect­or are in fact hybrids of tra­cing and ref­er­ence count­ing», quot­ing from Bacon et al.2, a paper you should def­in­itely read. Believe me, it’s a must in the world of Garbage Collection. In all humble, you can just grab a copy of the paper, for­get about read­ing me, just go read the ori­gin­al sources.

refcount-vs-tracing-comparison-revisited Garbage Collectors in the real world
An extens­ive con­trast between the two styles

Ok, so you’re still with me. Let’s com­pare these two styles of col­lec­tion. I said last time that «We can ima­gine that tra­cing starts with all nodes“ ref_count set to zero, and com­putes their incre­ments, while ref_counting starts with a ref_count set to the real value and com­putes the decre­ments». Let me add more, para­phras­ing Bacon’s paper: that’s what hap­pens at the mark­ing step, but then, at sweep­ing, tra­cing effect­ively tra­verses the graph from the roots, to find the live data, while ref_counting tra­verses the graph from the “anti-roots”, to find the dead data.

Tracing and Counting Hybrids

The algorithm we saw earli­er, where “ref­er­ences from the stack to the heap are traced, while ref­er­ences with­in the heap are coun­ted”, is called Deferred Reference Counting. As the root muta­tion rate is almost always extremely high, this algorithm moves the cost of con­sid­er­ing the roots from the applic­a­tion to the col­lect­or. On the oth­er hand, there’s a dual algorithm, where ref­er­ences from the stack to the heap are coun­ted and with­in the heap are traced. This dual algorithm, called Partial Tracing, has vir­tu­ally no imple­ment­a­tion, as it would have very poor per­form­ance for obvi­ous reas­ons.

Generational Garbage Collector

What per­haps is the biggest insight in the sci­ence of garbage col­lect­or, comes from an empir­ic­al obser­va­tion: object’s infant mor­tal­ity is enorm­ous. Objects usu­ally die young, that is, the chances that an object that has been just alloc­ated has to be killed instants after are quite large. Just think about tem­por­ar­ies of a func­tion: the func­tion gets called, cre­ates its tem­por­ar­ies, makes its com­pu­ta­tions, and returns: those tem­por­ar­ies were needed for a very small amount of time.

Hence, intu­it­ively, we can alloc­ate objects on a nurs­ery, garbage-col­lect it quite often, and when we see that a cer­tain object has sur­vived a few garbage-col­lec­tion passes, then it means it will prob­ably sur­vive for a long time, so we can move it to a mature space. Now, we can mix-nd-match all pos­sible per­muta­tions: two gen­er­a­tions, or three, or more, where the nurs­ery is ref_counted and the mature space is traced, or the nurs­ery traced and the mature ref_counted, or both traced or ref_counted, and where the trans-gen­er­a­tion­al ref­er­ence are ref_counted or traced dir­ectly or space-based…

Other optimizations

All memory man­age­ment suf­fers from one prob­lem in par­tic­u­lar: memory frag­ment­a­tion. If you recall from how the heap works, it hap­pens quite eas­ily that three small objects are alloc­ated con­tigu­ously and then the one in the middle gets deal­loc­ated, leav­ing a small whole. If sub­sequent requests for memory are nev­er small enough to fit in this whole, that whole is basic­ally lost — and it can get worse when you’ve vir­tu­ally (pun inten­ded) used all of your memory and requests fail, but you actu­ally have a lot of free space, just frag­men­ted.

So, what if we just move all the objects in memory every once in a while to refill the gaps? You know, just com­press them like in the pic below. But, pay very care­ful atten­tion: who­ever had a ref­er­ence to the sixth cell in the array would now have an inval­id point­er — all ref­er­ences to the moved cells need to be updated to their new loc­a­tion!

Fragmentation Garbage Collectors in the real world
Et violà! Memory defrag­men­ted

This is an idea from Cheeney’s algorithm, where a heap can actu­ally be divided into two, and col­lec­tion be reduced to copy­ing to the unused half of the heap only the liv­ing objects of the used half, then deal­loc­at­ing the first half at once in its entirety.

And this mix of gen­er­a­tions and cop­ies can be much more fine grained, with many many sub­di­vi­sions of memory where objects are traced with­in seg­ments, ref_counted between seg­ments, and copied across seg­ments, like in the example of the quite com­plex Train Algorithm3.

To wrap up

At last, let me leave you with the talk that gave me so much of my back­ground on the top­ic of Garbage Collection. Everything you ever wanted to know about the top­ic, at Mike’s talk and spe­cially at his bib­li­o­graphy. Have fun with his talk!

  1. Topic com­ing in the near future!
  2. A Unified Theory of Garbage Collection, by Bacon, Cheng, and Rajan, at IBM Watson Research Center; 2004
  3. A good review here:

Leave a Reply