Garbage Collectors in the real world

Well automatically cleaned factory

We saw the essen­tially two only ways to do Garbage Collection, and we saw their naive dis­ad­vant­ages. I say naive, because we saw the very simplist­ic imple­ment­a­tions. Like any­thing else in com­puter sci­ences, Garbage Collectors are an extremely optimised-(able) tool.

Lets give it a quick intu­it­ive thought. Tracing under­goes long pause times, mostly because it explores the entire heap at once, freez­ing the pro­gram to avoid data races between the GC and the pro­gram itself. A com­mon tech­nique for optim­ising mul­ti­th­readed pro­grams is to intro­duce more fine-grained lock­ing mech­an­isms; in this case, what if we incre­ment­ally trace smal­ler sec­tions of the memory graph instead of the entire graph at once?

Reference count­ing pays a price at the write bar­ri­er, which is spe­cially undesir­able when muta­tion rates are extremely high. Where is this rate extremely high? At the Call Stack. So it’s not sur­pris­ing to try to defer updates to the ref_count com­ing from stack frames. C++ and Rust go for a more soph­ist­ic­ated approach when pos­sible: to dis­tin­guish between bor­row­ing and shar­ing1; oth­er runtimes just skip the write bar­ri­er at a stack frame.

If tra­cing will trace incre­ment­al seg­ments of the heap, the prob­lem then turns into keep­ing all the seg­ments in sync: how to remove ele­ments of one seg­ment if we don’t know if oth­er seg­ments are point­ing to it? Then tra­cing intro­duces ref­er­ence count­ing between seg­ments, to keep track of how many cross-seg­ment ref­er­ences are liv­ing at a giv­en moment: a seg­ment can then be cleaned if the ref_count for this seg­ment is zero.

If write bar­ri­ers are skipped at the call stack, a resource can­not be release when the ref_count drops to zero, as there might be stack frames still point­ing to it: rather, they’re added to a Zero Count Table and every once in a while — just like in tra­cing — the runtime cal­cu­lates the real val­ues of the ele­ments of this table. References from the stack to the heap are traced, while ref­er­ences with­in the heap are counted.

Towards a real-life GC: A Unified Theory

In what is a truly remark­able event, turns out that «all high-per­form­ance rel­ev­ant col­lect­or are in fact hybrids of tra­cing and ref­er­ence count­ing», quot­ing from Bacon et al.2, a paper you should def­in­itely read. Believe me, it’s a must in the world of Garbage Collection. In all humble, you can just grab a copy of the paper, for­get about read­ing me, just go read the ori­gin­al sources.

refcount-vs-tracing-comparison-revisited Garbage Collectors in the real world
An extens­ive con­trast between the two styles

Ok, so you’re still with me. Let’s com­pare these two styles of col­lec­tion. I said last time that «We can ima­gine that tra­cing starts with all nodes“ ref_count set to zero, and com­putes their incre­ments, while ref_counting starts with a ref_count set to the real value and com­putes the decre­ments». Let me add more, para­phras­ing Bacon’s paper: that’s what hap­pens at the mark­ing step, but then, at sweep­ing, tra­cing effect­ively tra­verses the graph from the roots, to find the live data, while ref_counting tra­verses the graph from the “anti-roots”, to find the dead data.

Tracing and Counting Hybrids

The algorithm we saw earli­er, where “ref­er­ences from the stack to the heap are traced, while ref­er­ences with­in the heap are coun­ted”, is called Deferred Reference Counting. As the root muta­tion rate is almost always extremely high, this algorithm moves the cost of con­sid­er­ing the roots from the applic­a­tion to the col­lect­or. On the oth­er hand, there’s a dual algorithm, where ref­er­ences from the stack to the heap are coun­ted and with­in the heap are traced. This dual algorithm, called Partial Tracing, has vir­tu­ally no imple­ment­a­tion, as it would have very poor per­form­ance for obvi­ous reasons.

Generational Garbage Collector

What per­haps is the biggest insight in the sci­ence of garbage col­lect­or, comes from an empir­ic­al obser­va­tion: object’s infant mor­tal­ity is enorm­ous. Objects usu­ally die young, that is, the chances that an object that has been just alloc­ated has to be killed instants after are quite large. Just think about tem­por­ar­ies of a func­tion: the func­tion gets called, cre­ates its tem­por­ar­ies, makes its com­pu­ta­tions, and returns: those tem­por­ar­ies were needed for a very small amount of time.

Hence, intu­it­ively, we can alloc­ate objects on a nurs­ery, garbage-col­lect it quite often, and when we see that a cer­tain object has sur­vived a few garbage-col­lec­tion passes, then it means it will prob­ably sur­vive for a long time, so we can move it to a mature space. Now, we can mix-nd-match all pos­sible per­muta­tions: two gen­er­a­tions, or three, or more, where the nurs­ery is ref_counted and the mature space is traced, or the nurs­ery traced and the mature ref_counted, or both traced or ref_counted, and where the trans-gen­er­a­tion­al ref­er­ence are ref_counted or traced dir­ectly or space-based… 

Other optimizations

All memory man­age­ment suf­fers from one prob­lem in par­tic­u­lar: memory frag­ment­a­tion. If you recall from how the heap works, it hap­pens quite eas­ily that three small objects are alloc­ated con­tigu­ously and then the one in the middle gets deal­loc­ated, leav­ing a small whole. If sub­sequent requests for memory are nev­er small enough to fit in this whole, that whole is basic­ally lost — and it can get worse when you’ve vir­tu­ally (pun inten­ded) used all of your memory and requests fail, but you actu­ally have a lot of free space, just fragmented.

So, what if we just move all the objects in memory every once in a while to refill the gaps? You know, just com­press them like in the pic below. But, pay very care­ful atten­tion: who­ever had a ref­er­ence to the sixth cell in the array would now have an inval­id point­er — all ref­er­ences to the moved cells need to be updated to their new location!

Fragmentation Garbage Collectors in the real world
Et violà! Memory defragmented

This is an idea from Cheeney’s algorithm, where a heap can actu­ally be divided into two, and col­lec­tion be reduced to copy­ing to the unused half of the heap only the liv­ing objects of the used half, then deal­loc­at­ing the first half at once in its entirety.

And this mix of gen­er­a­tions and cop­ies can be much more fine grained, with many many sub­di­vi­sions of memory where objects are traced with­in seg­ments, ref_counted between seg­ments, and copied across seg­ments, like in the example of the quite com­plex Train Algorithm3.

To wrap up

At last, let me leave you with the talk that gave me so much of my back­ground on the top­ic of Garbage Collection. Everything you ever wanted to know about the top­ic, at Mike’s talk and spe­cially at his bib­li­o­graphy. Have fun with his talk!

  1. Topic com­ing in the near future!
  2. A Unified Theory of Garbage Collection, by Bacon, Cheng, and Rajan, at IBM Watson Research Center; 2004
  3. A good review here:

Leave a Reply