Remote Calls in Distributed Systems

When dealing with Distributed Systems (DSes), there is one subtle — yet crucial point to heed: we are being deceived. We are being swindled by our very own procedures and algorithms. It is an illusion — a magnificent delusion deliberately devised for deception. Yet, it is one of the top-notch technical advancements the world has ever seen. Without it, there is no Google, no Facebook. Internet would become a very dull place had it been kept at bay. We all use distributed systems — aware or unbeknownst of its large-scale existence. We all ripe the fruits of DSes. But why (or how) does it deceive us? Because it has to. The term “Distribution” refers to “sharing”. That is, dividing up the workload among, say, peers. This meaning leads to subtle disparities as well; for an example, distribution of, say, newspapers, doesn’t mean a thing about dividing up a workload. But we will stick to our primary definition of distribution i.e. dividing up the work. In the context of technical bandwagons, it suits better and to be honest, it helps to make my point.

Since the “giant awakening” of the digital age, computation proliferated, and wants and needs of people got inclined to digital phenomena. Back then, we stored data in paperbacks, wrote books with a pen (remember how weary we were back then?), recorded harmonies in cassettes and stored our videos in CDs. But now, things have changed. We’ve become “digitized”. So much ubiquity has befallen on us to an extent that now we don’t even think about CDs and Cassettes. We can access anything, living anywhere! A witty reader should now be slightly skeptic. How can we access anything from anywhere? I have tons of data stored in my Google Drive. How selfish am I to forget data which you have stored in Google Drive! Think of the mountains of data residing Google Drive servers right now. How can they be stored? Can they be stored in a single machine? Is there some cosmic hard-drive, devised in Eden after the Fault which can store the great summits of information we shall remember?

That’s where “Distribution” comes into the stage. Say that you’ve been forced by your parents to paint a huge wall, and your cousin stays idle, laughing at you while you painfully paint the damn thing. What comes to your mind? To drag that cousin and make him paint a half. That’s called distribution — you reduce your workload by utilizing more resources. But here’s the catch. We all are sentient beings — we don’t want complexity associated to our mundane tasks. If we want, say, to store a photo of my dog in a cupboard with five lockers, we don’t want to cut the photo into five pieces and store in the lockers. We don’t want that hassle. After all, we’re the same perplexed individuals we were back then. Complexity is something we loathe; we want all the intricacies and complexities to become — wait for the golden word — transparent.

Say that we have to upload some image to Google Drive, as we did store the photo of my dog. You just click “Upload” and that’s it! Fire and Forget! Do you know where your file is now? We don’t have the foggiest idea on the form of existence of that particular image! Google may rip the image into pieces, store it in hundred machines geographically apart — we don’t care, as long as we can download it back in the same form we’ve uploaded. For all we know, we’ve uploaded an image — and Google nicely displays it in an interface as a single unique humdrum file, from which we are able to download the image back. The underlying complexities are hidden from us. The underlying intricacies are transparent. This is the deception in DSes. Aptly designed DSes focus on our ease, and withdraw us from the hassle of dealing with complex systems. What we see is only the surface — the surface that we’re used to. We’re not exposed to the monster within. We call such systems “Coherent Systems”.

Communication

So say that you have to calculate the n-th Fibonacci number, and you have a crappy machine. Your neighbor just bought a nice MacBook Pro. There’re two ways you can calculate the n-th Fib number now. You can either ask your neighbor, or ask the MacBook Pro. For the latter, you need to invoke calculate_fib(n) procedure in your neighbor’s machine. This means that you have to communicate with the machine.

Communication is a vast topic. When we’re working with multiple processes in a single machine, we have our main memory to the rescue (Similarly, when we’re working with threads, we have a shared memory space within processes). We can use our RAM as a shared memory space. But what about when our resources are distributed geographically? There’s no global memory stores to share. Thus, we communicate through messages. It’s like asking your mother to bring some chocolate when she comes home. You take a call (or message; same thing), ask her to buy a chocolate for you and she does. Conceptually speaking, she invokes buy_chocolate() function after your call. Meanwhile, you can patiently wait, eager for the tang of the choco. Or, you can just get on with what you’ve been doing in-prior to your sudden craving for chocolate. Similar phenomena occur in Distributed Systems. When, say, the client has to perform a computationally intensive task, and the client does not have the resources to commit to that particular task, it needs to delegate it to some other machine by communication. Or, when the client does not possess the resources needed to satisfy the requirements, it needs to direct the request to the correct endpoint to fetch the resource.

The idea I proposed involving calling your mother to bring you a chocolate is very simple and something that happens everyday. This was amalgamated into Distributed Systems, in the early 80’s by Birrell and Nelson [1]. The essence of their communication paradigm is, calling a remote (i.e. residing on another machine) procedure as if calling a procedure inside the client application itself. This idea is phenomenal as it encapsulates the transparency element as well, since the complexities of the remote procedure are to be hidden. We’ll clarify the details later.

RPC — Remote Procedure Call

So back to our calculate_fib(n) problem. Displayed below is how it theoretically should be done, according to our prior discussion.

Communicating with your neighbor to calculate_fib(n)

You invoke calculate_fib(n) in your machine and it should direct some sort of a communication call to the server’s procedure. Thereafter, the server should execute your call and return the results back to your machine.

But here’s the catch. How do we communicate? Our long-ignored networking comes into the stage.

Image Source: http://www.quickmeme.com/meme/3547dl

Communication Basics

So the whole concept of Networking came to facilitate communication between a multitude of parties. To state this precisely, DSes and Networking are one. One complements the other. Take the OSI stack, for an instance. Application Layer hides all the abstractions of Physical, Mac, and Network layers. They provide transparency to the layers above. Heard something similar before? DSes and Networks go hand in hand.

So, we’ll use our knowledge on Transport Layer (TL) to solve this mysterious problem of communicating to our neighbor’s computer. First, we need to establish some sort of a network, through which we can access our neighbor’s computer. Generally, this is done via a socket abstraction (Another term, really?). A socket, simply, is a combination of an internet address (IP address) and a port. This is the abstract concept that we use as programmers to access remote interfaces. We aren’t bothered with all the protocols below Transport Layer (that’s for network engineers discretion) — we’re focused above the TL. Each socket that are being used are allocated a process, and the mapping between the socket and the processes is 1:1 i.e. one socket can bind only one process.

Image Reference: George, Coulouris; Distributed Systems Concepts and Design

In order to communicate through a socket, one must first establish a channel i.e. the pathway of communication between client’s socket and the agreed socket of the server. Through this channel, communication occurs and it is shut down afterwards. The retention period of the channel differs according to the protocol. In WebSocket, the established connection generally retains until either party decides to drop it (thus, it is heavily used for real-time communication). In general HTTP (SOAP/REST), the connection is disconnected once the communication of messages is done.

After establishing the connection, now it’s time to send the message. From Networking Fundamentals 101, we know that application-level messages cannot be transmitted via socket channels in their purest form. They need to be serialized into frames, which are further serialized into byte arrays in lower level layers of the OSI stack.

Image Reference: George, Coulouris; Distributed Systems Concepts and Design

So, we need some piece of software which we can use in order to send our message calculate_fib(n) to our neighbor’s machine. This piece of software should convert (marshal, as the nomenclature defines it) all the parameters and method signatures we issue, deal with underlying communication protocols, and provide us with the transparency beneath Transport Layer.

Welcome, Stubs.

That particular piece of software is called stub. I don’t know how the name got its place though. Interesting…

So the stub is responsible for putting the parameters and the name of the method (or some sort of an identifier) or method signature in the message, which is thereafter deserialized by the neighbor’s stub. When the neighbor’s machine is done with calculating the intensive calculate_fib(n) the neighbor’s stub serializes the result and sends it as a message to your stub, which thereafter unmarshells the result and return it to you. This is the underlying invisible dance that happens in simple RPC calls. It’s like clockwork! A stub is a friendly face which abstracts away the monster underneath. Hail the stub creators!

Let’s implement our Fib!

Abstraction is a widely used concept. Even in OOP, we use it in large-scale, particularly when dealing with Interfaces. An interface is some sort of a structure we expose to the rest of the world (Or as an API) to denote what is possible from some entity, or, to depict the overall structure of the entity. Implementing our RPC starts with defining an interface which specifies our program, the methods and its other parameters.The language we use to define this interface is, trivially, called Interface Definition Language (IDL). This specifies the name of the program, its parameters and identifiers for each procedure it consists of.

IDL.x which defines the interface for client server communication

In this example, the CALCULATE_FIB(int) procedure is declared to be procedure 1, in version 1 of the remote program FIBCALCULATION, with the program number 0x20000099. (It is the generally accepted convention to have user-defined program numbers starting from 0x20000000) Notice that everything is declared with all capital letters. This is not required, but is a good convention to follow [5].

Version numbers are incremented when functionality is changed in the remote program. Existing procedures can be changed or new ones can be added. More than one version of a remote program can be defined and a version can have more than one procedure defined [4].

Now, we’ll use rpcgen to generate the client and server stubs. Note that I use a Linux machine for this, which is recommended. But you can use some terminal client like PuTTY for if you’re in Windows.

% rpcgen -a -C IDL.x

The resulting four files:

IDL_client.c  IDL_clnt.c  IDL.h  IDL_server.c  IDL_svc.c  IDL.x  Makefile.IDL

Files IDL_client.c and IDL_server.c are used to define the functions and procedures we’re going to use. Other files are mere helpers for rpcgen. The header files generated i.e. IDL.h must be shared amongst the client and the server since it encapsulates all the #define statements. We’ll examine IDL_server.c and IDL.client.c one by one. Note that the stubs are IDL_clnt.c (which is the client stub) and IDL_svc.c, (which is the server stub) which we do not modify (rpcgen asks us not to modify them, and you can see it if you open those files)

IDL_server.c

We have implemented calculate_fib(n) recursively and called that function in calculate_fib_1_svc function call. It’s trivial.

IDL_client.c

This is where some interesting code fragments reside. First, take your attention to the following code line.

clnt = clnt_create (host, FIBCALCULATION, FIBCALCULATION_VERSIONONE, "udp");

This creates a handle used for calling FIBCALCULATION on the server designated on the command line [5]. This client handle is passed to the stub routine that calls the remote procedure. The connection protocol we have established is UDP (You can change it to TCP) i.e. connection-less communication. I have modified the generated method signature of fibcalculation_1 to have another parameter to hold n . When fibcalculation_1 is called in main() of the IDL_client.c , we pass two attributes: 1) Server IP 2) n . Therefore, I have converted the parameter string nto an integer and passed it to fibcalculation_1. Furthermore, I have set calculate_fib_1_arg parameter to n .

Now, we’ll run the server using the command below. That is, this the piece of object code that we run on the server.

% sudo ./IDL_server

To run the client, we have to provide the server IP and n as command line arguments. That is, this is the code that we run inside the client.

% sudo ./IDL_client localhost 10
Client calling the locally hosted server with n = 10. The server sends the response 55.
Server got called with n = 10.

Notice what happens. When the server is up and running (using sudo IDL_server.c ), the server exposes an interface (which we’ve defined in the IDL and its stub). The client, thereafter, would join through a TCP connection mediated by its stub (IDL_clnt.c ). The client would invoke its very own fibcalculation_1 function which calculates the n-th fib number. This invocation, like any other function call in programming languages, calls up the calling sequence with all its variables and parameters stacked up. Inside the client’s invocation of fibcalculaltion_1 , there lies the invocation of calculate_fib_1 (line 26) through which the client stub is invoked. The stub will take care of the rest of communication, unbeknownst to the client i.e. the client not being aware of any sort of a TCP transmission. As long as the client is concerned, it’s just as if the client invoked a routine call calculate_fib_1 , and it returned the result. The underlying communication occurs via stubs whose implementation is hidden from the client module and server module.

This might seem a little deceiving: especially if you’ve looked thoroughly at the client code. There’s no hint of an idea that the server did the calculation. But that’s just the way I’ve coded this. Always remember that we did not give the Fibonacci implementation to the client. We specified it in the server. So the result must have gotten to the client from the server. Well, you can obviously printf() in the server after the calculation. You’re more than welcome to do so.

So, a summary.

What I have tried, in my shrewd ways of conveying information, is to give you some sort of an introductory overview on RPC communication. RPC is the foundation for many application level protocols in network communication and the inception of RPC and RMI was kind of a revolution back in the 80’s. The commonplace delicacies would be very hard to implement if certain levels of abstractions were absent and complexities involved a large workload. Modular architectures abstracted away most of the underlying difficulties and have eased implementing communication protocols in distributed systems. I have tried to convey the extent to which transparency has alleviated our implementations. For the sake of completeness, I have implemented a small RPC message passing client-server communication where the server calculates the n-th Fibonacci number. However, in the real world, this calculate_fib() function might be a very intensive operation which takes many compute cycles — hence the need for offloading the computation to a more resourceful server. In today’s context, most communication are event-based and asynchronous, i.e. clients don’t wait until the server does its job, and the servers don’t wait until a particular client sends him a request. Things have changed but the underlying fundamentals remain the same. The legacy RPC protocol was devised to be of synchronous in nature, as depicted in the figure below.

Image source: Distributed Systems Principles Andrew Tanenbaum

There are a lot of other issues related to implementing transparent distributed systems, such as synchronization issues with respect to logical clocks, which I have not emphasized here. Those issues introduce more complexities into communication and it goes way beyond the scope of a simple article. Therefore, we will stop here for now.

Wait! Not yet.

Before stopping, I would like to address some fluids of practicality here. To us all who got into Software Engineering recently, all these RPC calls seem very alienated. Most of us have not used legacy RPC mechanisms; what we’re used to is more REST-like communication protocols. How can we connect these two? Well, if you think about it, it is sort of a trivial matter. RPC stands for Remote Procedure Call i.e. invoking a procedure stored in a remote machine. Why can’t we do it using HTTP (which is an application-level protocol)? Heed this simple HTTP POST call.

POST /sayHello HTTP/1.1
HOST: api.example.com
Content-Type: application/json
{"name": "John Doe"}

Through this call, we can invoke a remote sayHello function with the argument “John Doe" . To illustrate this further, take this simple NodeJS controller code fragment.

app.post("/sayHello?name='John Doe'", () => {sayHello(req.query.name); }

Or, in Spring Boot (Java)

@PostMapping("/sayHello")
public void sayHello(@RequestParam String name) {
// implementation of the function}

Therefore, HTTP can also be used as a protocol for RPC. But this brings us to all these REST-like stuff. What’s the difference? REST is a mere style of architecture that we communicate. In REST, people encourage the structure or representation of data which we communicate. Take this very comprehensive contrast which I’ve copied and pasted from StackOverflow.

In general RPC, since we’re invoking procedures, they seem like “commands” or “instructions”. In REST, we focus on resources, not on commands. A more comprehensive analysis on REST vs general RPC is written on one of my previous blog posts. REST is also a style of communication implemented on top of HTTP.

OK, we’ll stop.

You’re always welcome to suggest any changes to this post’s content. I will list the references below and believe me when I say this, these are very comprehensive references. Further reading is highly encouraged.

References

[1] Andrew D. Birrell and Bruce Jay Nelson. 1984. Implementing remote procedure calls. ACM Trans. Comput. Syst. 2, 1 (February 1984), 39–59. DOI=http://dx.doi.org/10.1145/2080.357392

[2] George Coulouris, Jean Dollimore, Tim Kindberg, and Gordon Blair. 2011. Distributed Systems: Concepts and Design (5th ed.). Addison-Wesley Publishing Company, , USA.

[3] Andrew S. Tanenbaum and Maarten van Steen. 2006. Distributed Systems: Principles and Paradigms (2nd Edition). Prentice-Hall, Inc., Upper Saddle River, NJ, USA.

[4] “Chapter 3 rpcgen Programming Guide,” Chapter 3 rpcgen Programming Guide (ONC Developer’s Guide). [Online]. Available: https://docs.oracle.com/cd/E19683-01/816-1435/6m7rrfn7f/index.html. [Accessed: 04-Nov-2019].

[5] rpcgen Programming Guide

Software Engineer, CSE Graduate @ University of Moratuwa