Far From Ready — Marauding through what's left of Europe

Garbage Collectors in the real world

Lorem ipsum dolor sit amet, consectetur adipisicing elit.

2019-01-22

automatic-garbage-collection garbage-collection memory-management reference-counting resource-management tracing-garbage-collection unified-theory-of-garbage-collection

We saw the essentially two only ways to do Garbage Collection, and we saw their naive disadvantages. I say naive, because we saw the very simplistic implementations. Like anything else in computer sciences, Garbage Collectors are an extremely optimised-(able) tool.

Lets give it a quick intuitive thought. Tracing undergoes long pause times, mostly because it explores the entire heap at once, freezing the program to avoid data races between the GC and the program itself. A common technique for optimising multithreaded programs is to introduce more fine-grained locking mechanisms; in this case, what if we incrementally trace smaller sections of the memory graph instead of the entire graph at once?

Reference counting pays a price at the write barrier, which is specially undesirable when mutation rates are extremely high. Where is this rate extremely high? At the Call Stack. So it's not surprising to try to defer updates to the ref_count coming from stack frames. C++ and Rust go for a more sophisticated approach when possible: to distinguish between borrowing and sharing[note]Topic coming in the near future![/note]; other runtimes just skip the write barrier at a stack frame.

If tracing will trace incremental segments of the heap, the problem then turns into keeping all the segments in sync: how to remove elements of one segment if we don't know if other segments are pointing to it? Then tracing introduces reference counting between segments, to keep track of how many cross-segment references are living at a given moment: a segment can then be cleaned if the ref_count for this segment is zero.

If write barriers are skipped at the call stack, a resource cannot be release when the ref_count drops to zero, as there might be stack frames still pointing to it: rather, they're added to a Zero Count Table and every once in a while — just like in tracing — the runtime calculates the real values of the elements of this table. References from the stack to the heap are traced, while references within the heap are counted.

Towards a real-life GC: A Unified Theory

In what is a truly remarkable event, turns out that «all high-performance relevant collector are in fact hybrids of tracing and reference counting», quoting from Bacon et al.[note]A Unified Theory of Garbage Collection, by Bacon, Cheng, and Rajan, at IBM Watson Research Center; 2004[/note], a paper you should definitely read. Believe me, it's a must in the world of Garbage Collection. In all humble, you can just grab a copy of the paper, forget about reading me, just go read the original sources.

An extensive contrast between the two styles

Ok, so you're still with me. Let's compare these two styles of collection. I said last time that «We can imagine that tracing starts with all nodes' ref_count set to zero, and computes their increments, while ref_counting starts with a ref_count set to the real value and computes the decrements». Let me add more, paraphrasing Bacon's paper: that's what happens at the marking step, but then, at sweeping, tracing effectively traverses the graph from the roots, to find the live data, while ref_counting traverses the graph from the "anti-roots", to find the dead data.

Tracing and Counting Hybrids

The algorithm we saw earlier, where "references from the stack to the heap are traced, while references within the heap are counted", is called Deferred Reference Counting. As the root mutation rate is almost always extremely high, this algorithm moves the cost of considering the roots from the application to the collector. On the other hand, there's a dual algorithm, where references from the stack to the heap are counted and within the heap are traced. This dual algorithm, called Partial Tracing, has virtually no implementation, as it would have very poor performance for obvious reasons.

Generational Garbage Collector

What perhaps is the biggest insight in the science of garbage collector, comes from an empirical observation: object's infant mortality is enormous. Objects usually die young, that is, the chances that an object that has been just allocated has to be killed instants after are quite large. Just think about temporaries of a function: the function gets called, creates its temporaries, makes its computations, and returns: those temporaries were needed for a very small amount of time.

Hence, intuitively, we can allocate objects on a nursery, garbage-collect it quite often, and when we see that a certain object has survived a few garbage-collection passes, then it means it will probably survive for a long time, so we can move it to a mature space. Now, we can mix-nd-match all possible permutations: two generations, or three, or more, where the nursery is ref_counted and the mature space is traced, or the nursery traced and the mature ref_counted, or both traced or ref_counted, and where the trans-generational reference are ref_counted or traced directly or space-based...

Other optimizations

All memory management suffers from one problem in particular: memory fragmentation. If you recall from how the heap works, it happens quite easily that three small objects are allocated contiguously and then the one in the middle gets deallocated, leaving a small whole. If subsequent requests for memory are never small enough to fit in this whole, that whole is basically lost — and it can get worse when you've virtually (pun intended) used all of your memory and requests fail, but you actually have a lot of free space, just fragmented.

So, what if we just move all the objects in memory every once in a while to refill the gaps? You know, just compress them like in the pic below. But, pay very careful attention: whoever had a reference to the sixth cell in the array would now have an invalid pointer — all references to the moved cells need to be updated to their new location!

Memory defragmentation.

Et violà! Memory defragmented

This is an idea from Cheeney's algorithm, where a heap can actually be divided into two, and collection be reduced to copying to the unused half of the heap only the living objects of the used half, then deallocating the first half at once in its entirety.

And this mix of generations and copies can be much more fine grained, with many many subdivisions of memory where objects are traced within segments, ref_counted between segments, and copied across segments, like in the example of the quite complex Train Algorithm[note]A good review here: https://youtu.be/kpW4lCwQWHc[/note].

To wrap up

At last, let me leave you with the talk that gave me so much of my background on the topic of Garbage Collection. Everything you ever wanted to know about the topic, at Mike's talk and specially at his bibliography. Have fun with his talk!

https://youtu.be/t8dj56h2gbg