Garbage and Islands of Isolation!

Garbage collection is the greatest mystery for most of the new programmers. Particularly because Sun/Oracle has managed to abstract the aspects of Garbage Collectors nicely away from them. This ultimately makes programmers who are freshly welcomed to the industry to write bad code for system internals. This article is a brief overview on Garbage Collection and the concept called Island of Isolation.

World before Garbage Collectors..

Twenty, thirty years back, memory was everything. There was no Java (Which was introduced back in ’95) and no Python — so you had to manage memory for your dense desktop applications or your complex algorithms. Well, they might have not been that complex because you might have not been able to run complex algorithm models with the machines which were present at that time. Things surely have now changed — but Moore Law is dead because of the lag of progression of memory over the years. So even today, managing memory is of vital importance but not as before where we had to go into the assembly level code to figure out the leakages of memory.

Garbage Collection — why?

In order to understand the core of Garbage Collection, you need a little bit of system architecture concepts. So we’ll review it right now.

In order to write programs, one must use variables and thus, one needs to access memory in order to store values. For an example, take this for loop.

for (int i = 0; i < n; ++i) {
std::cout << i << std::endl;
}

This is just a simple loop iterating over the set of integers from 0 to n . So there are some critical aspects of memory where a clever programmer must attend to.

Consider the variable i . It needs to be incremented according to the instruction ++i . In order to increment it, the program needs to remember the previous values of i thus inviting the need of memory or registers.

Programming Languages does not include these variables like i just in a random position in memory — quite the contrary — they have a specific paradigm — a well organized one — for allocating variables. Each runtime has its own stack where functions are loaded and each function has its own stack frame embedded inside the stack itself. When the function is fully executed, the frame regarding to that particular function is popped out of the stack with the all the variables, return values and links.

So this variable i is stored inside the stack frame of the function of this particular execution. Maybe it’s the main method, or a child method running inside the main method itself. In this case, the main method will hold its stack pointer until the child function has done execution, and will return to its stack pointer value when the child function is fully executed, and popped out from the stack.

With intense progression of programming paradigms, a new paradigm of OOP came into play with Classes and Objects — which gave the rise of Garbage Collection. Every object (Instance of a class) instantiated is allocated into the Heap. This is because the user does not need to worry about the lifetime of objects because the Heap is where the Garbage Collector does its job.

So why Garbage Collection? It is of vital importance to get rid of memory leaks in the Heap. Heed this classic example of showing the deallocation of a reference.

public void garbageCollectionEx() {
Object o = new Object();
Object p = new Object();
o = null;
}

The object which was referred by o is no more reachable at runtime. Thus the Garbage Collector will reach out and grab that particular object and remove it from the heap. The memory management then takes care of reducing the fragmentation which is left by the GC.

Note that Garbage Collection of stack variables is of no need because upon returning of a function, all the variables are anyway popped out because the stack frame which belongs to that particular function is popped out. For your convenience, I’ll add a stack frame of a function here. This is how stack frames are usually depicted in books.

Image for post
Image for post
Depiction of a Stack Frame of a function — Source: Dragon Book
Image for post
Image for post
How parent functions reach child functions using control links— Source:Dragon Book

Reachability for an object can be observed in multiple perspectives. The GC collects the object from the heap when it is unreachable or achieve null-reachability.

  1. Object Allocation — The binding between the reference and the object is taken place. Thus the object becomes reachable.
  2. Parameter Passing — The object remains reachable even though there would be some kind of referencing between the input parameter and the formal parameter towards the object. The formal parameter too will be referencing the object.
  3. Reference Assignments — This is the classic example of assigning multiple references and the former reference becoming useless where the latter reference will refer the object.
  4. Procedure Returns — Upon returning of a function, since the stack frame is popped, all the objects which was instantiated within the function becomes unreachable.
public void garbageCollectionEx() {
Object o = new Object();
Object p;
p = o;
}

But are these only ways for objects to become eligible for a GC?

Island of Isolation

Consider this diagram to understand how the concept of Island of Isolation has taken place and how it relates to Garbage Collection.

Image for post
Image for post
Object A and Object B are being referred by z and k, while x and y which are inside A and B, refers to Object B and Object A respectively

Consider a situation like the above. There, you have two objects called A and B, and each objects are instantiated somewhere. These objects are referred by a reference variable created by a mutator. A mutator is a program which does creating, updating and deleting objects.

Now say that we nullify z. Thus, we would not be able to reach A from z. But, still, we can reach A via k. We access B via k, and then we can reach A via y. Thus, A is still reachable.

Image for post
Image for post
Even if we nullify z, A is still reachable

Now, for argument’s sake, let us nullify k.

Image for post
Now, A and B both cannot be reached.

Note that we now cannot reach either A or B.

This phenomena is called as Island of Isolation. Objects in an island of isolation are eligible for Garbage Collection.

class IslandOfIsolation {
A z = new A();
B k = new B();
z.y = k;
k.x = z;
k = z = null;
// Objects previously referred by z and k are now isolated.
}
class A {
B y;
}
class B {
A x;
}

Island of Isolation is a very unique circumstance of reaching the eligibility for GC of an object. But how, in fact, do we determine the reachability of objects in other situations? To utilize this, there are two concepts that are utilized which are called Reference Counting and Trace-Based Garbage Collections. Trace Based GC will be discussed in another post.

Reference counting GC

Heed the four perspectives of reachability that we have already discussed so far. Those are,

  1. Object Allocation.
  2. Parameter Passing
  3. Reference Assignments
  4. Procedure Returns

Object Allocation

When an object is instantiated, the reference count for that particular object is incremented by 1.

Parameter Passing

For an example, say we pass our object into a method. Here, the input parameter is being assigned into the formal parameter of the method signature itself, and thus, the reference count is incremented by one. This is because the object is still reachable even though we assign the formal parameter as well to refer that particular object.

Reference Assignments

This is like assigning another reference variable for an object.

Object p = new Object();
Object o = new Object();
p = o;

The reference count for the object referred by o here, is incremented by 1, while the reference count of the old object which was referred by p decrements by 1. This is trivially understood because now, p will point to the object referred by o as well. The old object which was referred by p does not have a reference anymore (In this context).

Procedure Returns

Upon procedure returns (method returns) all the reference counts which was held by local variables of that particular activation record (stack frame) must be decremented. This is also trivial given the fact that we cannot access the local variables upon returning the procedure (popping the stack).

Transitive loss of Reachability

Whenever the reachability (reference count) becomes null (0), we must decrement all the reference counts of objects which are transitively reachable from the former object. That is, for an example, if A refers B and C, and B refers D, upon the null-reachability arises for A, the reference counts of B,C and D should be decremented.

Upon reaching 0 reference count, the object is collected by the GC.

Reference counting GCs are known to be quite expensive because of the fact that it should keep an atomic count of reference. Managing this count incurs a considerable amount of overheads.

Diving a bit further, languages like C/C++ shall not implement an automated GC because of the fact that they are type unsafe i.e. type casting can be done without any exceptions or errors. For an example, consider the code snippet below.

char arr[5];
arr[5] = 'd';

The above statement is syntactically correct. It does not violate any lexical preposition. However, intuition says that it is not correct as it shall yield some sort of a out-of-bounds error for the array. But in languages like C/C++, these kinds of statements are considered to be just meaningless. They do not lead to an error or to a throw of an exception.

But in type safe languages, these yield into errors. If you type the above snippet in Java, you’ll get an IndexOutOfBoundException .

Because C/C++ languages are type unsafe, pointer casting can be utilized in order to do further calculations with pointers — which leads us to a critical fact about GCs in type unsafe languages. Say you had a pointer to one particular memory location in the heap, and you do some pointer arithmetic on that particular pointer. Once you do that, since now your pointer points to another memory location, the former object is eligible towards GC. But if you reverse your pointer arithmetic now, you get a null pointer exception because the object which lay in the memory location you previously pointed through the pointer no more exists because the GC collected it. Thus, it is quite dangerous to have an automated GC in languages where you work with pointers — you usually do a manual garbage collection in those type of languages. However, now there are new concepts like Smart Pointers to make GC easier for the programmer.

Resource — Compilers, Principles and Techniques (2nd Edition) (Dragon Book)

Software Engineer @ CodeGen International, CSE Graduate @ University of Moratuwa

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store