Developing Own Online IDE and Code Compiler on Google Cloud Platform — Part 1

Introduction

Faisal Alam
6 min readMar 4, 2020

Have you ever wanted to create your own online IDE, similar to CodeChef IDE, Geeksforgeeks IDE, or any of your favourite online IDE? In this series of articles, we will design and implement our own online code compiler which can scale elastically on Google Cloud Platform. Of course, we will keep in mind the limits and quotas that GCP enforces per project. You can simply submit a quota increase request on GCP if you need to scale beyond your default limits.

You can see the IDE that I have created here at this link: https://ide.ctfhub.io.

There will be four articles in this series. In this article we are going to discuss high-level architecture of various components, and how each of these components will scale. Over the next three articles, we will explore the following components of our online IDE service.

  1. Worker Containers: Whenever a user submits a code written in any supported language, it would be compiled and executed in the worker Docker container. There will be one container for every supported language.
  2. Task Master: The taskmaster’s job is to pick up execution tasks from queue (PubSub topic) and spin up a worker container for the task. Once the task finishes, the worker container writes the output files on the disk. The taskmaster reads the data from these files and puts the output information in another queue (another PubSub topic), and then it deletes those files.
  3. Back-end API: The back-end API handles all the client interactions. This is the component that will actually get the code (and custom input for the code) from the client, put the task in the queue from where the taskmaster will pick it up. The back-end will generate and return a callback URL to the client. The client polls this callback URL continuously until it returns “non-pending” status in response.

Architecture

IDE Architecture Diagram
IDE Architecture

Looks simple, right? Now, let’s talk about the various components in brief.

  1. Workers

Each submitted task runs in a worker docker container which is based on Alpine docker image. The worker container compiles and executes the code submitted by the user and save its output on the disk. If a code does not terminates within a set timeout duration, the worker will kill the execution of the program and exit.

Linux has inbuilt software binaries at /usr/bin/time and /usr/bin/timeout. The “time” binary can be used to track execution time (also the memory usage) of the program and the “timeout” binary can be used to terminate a program if its execution exceeds a specified time limit.

The shell command would look something like this:

/usr/bin/timeout -t 5 bash -c "/usr/bin/time -f "%e" -o time.out --quite python3 source.py 2> out.stderr 1> out.stdout"

The above command will terminate the Python script if it does not terminate within 5 seconds. But I will be using a modified version of GNU Time software that will itself take care of both logging out time and memory usage, and will also terminate the program on timeout. You can find my modified version of GNU Time at the following Git repository.

https://github.com/ifaisalalam/GNU-time/releases

2. Taskmaster

The taskmaster instances run within an auto-scalable managed Compute Engine group. Each instance has a pull-based subscription on “ide-tasks” PubSub topic. Each taskmaster instance picks up and processes one task at a time from PubSub. Why only one task at a time, you ask? Because we will limit each task to a specified amount of memory and CPU resource. Since we will be deploying a small GCE instance (to save cost), therefore to ensure that the memory and CPU is available to use for processing the task, we will limit the taskmaster to process one task at a time. But you’re free to deploy a bigger instance and configure the taskmaster to process more than one task simultaneously.

Once a task arrives, the taskmaster creates a folder on the disk and copies the code and custom input to two different files in that folder. The taskmaster then spins up a temporary worker docker container as per the programming language mentioned in the message payload, and mounts the above created folder into this container.

The worker container compiles and runs the code until the program terminates or the execution times out. This worker container writes the compile error logs, program output and execution time duration to three different files in the same mounted folder. Once this is done, the worker container exists and the taskmaster continues further processing the output.

The taskmaster reads the outputs from the files written by the worker, prepares a JSON payload and publishes it to another PubSub topic “ide-results”. The taskmaster then deletes the request folder and continues processing other tasks (if there are any; else it waits for new task to arrive).

3. Client Back-end Instances

The back-end API instances are managed by an auto-scalable managed Compute Engine group. Each compute instance runs an HTTP API server written in Node.js. The server saves the request in a MongoDB collection that generates a request ID (the _id property of the inserted document). The server publishes the code and custom input for the code along with the request ID on a PubSub topic called “ide-tasks”.

Once the task is processed and published to “ide-results” PubSub topic by the taskmaster, a push-based subscription passes the message to the Client Back-end by making an HTTP POST request to an endpoint exposed by the client back-end service.

The client back-end receives the message payload and updates the MongoDB document with the received output and status of the task (success / timeout / compile-error / runtime-error).

When the client submits a code to the back-end, they will receive a callback URL. The client uses this callback URL to continuously poll the server until the server returns output data (or until max polling limit is reached).

3.1. MongoDB Database

The database I am using runs on MongoDB Atlas on a free cluster. Of course, this will be a bottleneck in our service since the database is not big enough to serve large number of simultaneous requests. I’ve chosen this to save cost and you can choose a bigger cluster tier while building your cluster on MongoDB Atlas.

I recommend choosing the option to deploy your MongoDB cluster on GCP in the same region as the client back-end servers, so as to have lower latency between the client back-end server and the database.

3.2. Redis

We will use Redis to implement rate-limiting on the client back-end server. We can also use MongoDB for the purpose of implementing rate-limiting, but I am not choosing to because I do not want to overload the database cluster (being optimistic about the number of users who will use my IDE. LOL! xD). Also, by using Redis instead we can expect a sightly better response time from the API because of two reasons — First, Redis is faster; and second Redis will be running in the same VPC as the back-end server instances.

4. Global HTTP(s) Load Balancer

The global HTTP(s) load balancer routes incoming requests to either Compute Engine group that runs the client back-end server (routes prefixed with `/api`), or the Storage Bucket that serves the front-end (any other route).

We will implement the worker containers, taskmaster and the client back-end over the next three articles and will automate deployment using TravisCI, GitLab and Terraform. We will also configure health checks and auto-healing to ensure high availability of all the services. I’ll also discuss how to secure worker containers by isolating them from internet connectivity and by blocking inter-container network communication.

I’ll update the links to the next articles in this one as I publish them. Or, you can follow me on Medium at https://medium.com/@ifaisalalam/.

Sharing the link to my IDE here again. https://ide.ctfhub.io.

Code Reference

  1. Workers: https://github.com/ifaisalalam/ide-workers
  2. Modified GNU Time: https://github.com/ifaisalalam/GNU-time/releases
  3. Taskmaster: https://github.com/ifaisalalam/ide-taskmaster
  4. Client Back-end: https://github.com/ifaisalalam/ide-backend

--

--

Faisal Alam
Faisal Alam

Written by Faisal Alam

Software Development Engineer 2 at Amazon

Responses (1)