As part of my day job, I write and compile a lot of code. My laptop is not that strong and I find myself wasting a lot of time on compilation. Then I asked myself, why shouldn’t I use the cloud for getting more compute power ?
Choosing cloud provider and instance type
My best fit should be a compute optimized instance (so the code will compile faster) with at least 8 CPU cores and 16 GB of RAM.
I have some experience with Azure and AWS (and we already have VPN there) so I went to find myself the best fit in the closest region (so I won’t suffer from high latencies):
- In Azure, The VM size that answers my requirements is “F8s” which offers 8 CPU cores, 16 GB of RAM and is based on the Intel Xeon® E5-2673 v3 (Haswell) processor (you can review the pricing here). It’s price is 0.453$ / hour.
- In AWS, The instance type that answers my requirements is “c4.2xlarge” which offers 8 CPU cores, 15 GB of RAM and is based on the Intel Xeon® E5-2666 v3 (Haswell) processor (you can review the pricing here). It’s price is 0.453$ / hour.
Looks like the price is similar and the processor family is almost the same but AWS has 1GB less RAM memory (but I guess I can live with that). More over, by taking some simple steps to split the “stateful data” from the instance, I can leverage AWS spot instances to save some money.
By reviewing the AWS spot pricing for c4.2xlarge instance type, I see that the current market price is stable and is about 0.111$ / hour. Means that I can save up to 75% on the price !!!
Fast calculation will give us that the monthly cost should be around 0.1$ (per hour) X 9 (hours a day) X 22 (days) = 22$ per month. a small price for the time I’ll save.
Each time I start a new cloud instance I want it to have the same IP address, so I’ve created an ENI (Elastic Network Interface) that I’ll keep and will assign to each instance I’ll create. In case the instance will be terminated (by me or by AWS), the ENI will remain “alive” so I can re-use it on the next time I’ll start a new instance.
Our code base is stored in a private GIT repository. I have a clone of this repository on my laptop and I use it for working on it. In order to be able to compile and run tests on my local code base without pushing my changes to the remote branch each time I change something and want to test it, I need to create a clone of my local repository in the cloud instance. In order to make sure it won’t be deleted each time the instance is terminated, I’ve created an EBS (Elastic Block Storage) volume that I can attach to my instance and mount it when needed. Once the instance is terminated, the EBS volume will become “available” again and I can re-attach it to another instance.
Now, I need to prepare my build environment. EC2 (Elastic Compute Cloud) instances can be created from an AMI (Amazon Machine Image), so I took the latest AWS Ubuntu 16.04 AMI and lauched an instance from it. I’ve connected to the instance (using SSH) and installed everything I need (compiler, python packages, tools and etc.). Once everything was installed and ready, I’ve turned off the instance and captured an AMI image from it.
What I have now ready is:
- ENI that I can attach to my instance so I’ll preserve the IP address.
- EBS volume that contains my GIT repository (so when I’ll launch a new instance I won’t need to clone the whole repository but just use incremental fetches).
- AMI image that with all my environment ready to be deployed.
Automating the process
As you know, software engineers are lazy creatures so I chose to write some Python script that will automate the whole process of starting a new builder for me. The script should:
- Create a new spot request for a c4.2xlarge instance from my AMI (using my ENI).
- Make sure the spot request is fulfilled and wait until the instance will be available for SSH connections.
- Attach my EBS volume and mount it inside the instance.
The best way (AFAIK) to work with AWS from Python is using the boto library.
sudo pip install boto
The relevant part from the script that starts a new spot request looks like that:
def requestSpotInstance(self): eni = boto.ec2.networkinterface.NetworkInterfaceSpecification(network_interface_id=self.settings.EniId, device_index=0, delete_on_termination=False) network_interfaces = boto.ec2.networkinterface.NetworkInterfaceCollection(eni) logger.info("Requesting spot instance of type %s" % self.settings.InstanceType) req = self.ec2.request_spot_instances(price=self.settings.Price, image_id=self.settings.AmiId, instance_type=self.settings.InstanceType, availability_zone_group=self.settings.Region, placement=self.settings.AvailabilityZone, key_name=self.settings.KeyPair, network_interfaces=network_interfaces) sir_id = req.id instance_id = self.getInstanceIdFromSpotRequest(sir_id, timeout=120) if (instance_id == None): logger.warn("Spot request %s completed with failure, canceling the request" % sir_id) self.helper.cancelSpotRequest(sir_id) return logger.info("Spot request completed, instance id: %s" % instance_id) instance = self.prepareInstance(instance_id) return instance
As you can see the usage of the boto library is simple enough, I’ve created a NetworkInterfaceSpecification object with my pre-created eni, and started a new spot instance request providing all the details regarding the instance (keys, type, region, etc.). After the instance is created, we tag it and attach the pre-created ebs volume in a separate API calls (implemented in prepareInstance):
def prepareInstance(self, instance_id): instance = self.getInstanceObject(instance_id) logger.info("Waiting for instance %s to be running" % instance_id) is_running = self.waitForRunningState(instance, timeout=120) if not(is_running): logger.warn("Instance %s is not running, terminating the instance" % instance_id) self.terminateInstance(instance_id) return None logger.info("Tagging instance %s" % instance_id) self.tagInstance(instance, self.settings.InstanceName) logger.info("Attaching volume %s" % self.settings.DataVolume) attached = self.ec2.attach_volume(volume_id=self.settings.DataVolume, instance_id=instance_id, device="/dev/sdf") if not(attached): logger.warn("Failed attaching volume %s to instance %s" % (self.settings.DataVolume, instance_id)) self.terminateInstance(instance_id) return None logger.info("Instance %s was successfully created (ip: %s)" % (instance_id, instance.private_ip_address)) return instance def tagInstance(self, instance, name): if (instance == None): return False status = instance.update() if (status.lower() != "running"): logger.warn("Cannot mark instance %s in status: %s" % (instance.id, status)) return False instance.add_tag("Name", name) instance.add_tag("Owner", self.settings.InstanceOwner) return True
After starting a new instance we need to ssh to it and run mount the data disk on it. I found a nice ssh library for python called paramiko that implements the SSHv2 protocol and wrote a simple wrapper that will help me use it:
import logging import paramiko import time logger = logging.getLogger() class SSHWrapper(object): def __init__(self): self.ssh = paramiko.SSHClient() self.ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy()) self.connected = False def __del__(self): if self.connected: self.disconnect() def connect(self, hostname, username, pem, timeout=60): key = paramiko.RSAKey.from_private_key_file(pem) try: logger.debug("Connecting to %s" % hostname) self.ssh.connect(hostname, username=username, pkey=key, timeout=timeout) logger.debug("Successfully connected to %s" % hostname) self.hostname = hostname self.connected = True return True except Exception as e: logger.warn("Failed connecting to %s - retry limit exceeded" % hostname) self.hostname = "" self.connected = False return False def disconnect(self): if self.connected: logger.debug("Disconnecting from %s" % self.hostname) self.ssh.close() self.hostname = "" self.connected = False def execute(self, cmd, show_output=True, throw_on_error=True): if not(self.connected): logger.error("Cannot execute ssh command - not connected") return 255 logger.info("Running on remote host %s: '%s'" % (self.hostname, cmd)) _, stdout, stderr = self.ssh.exec_command(cmd) rc = stdout.channel.recv_exit_status() logger.info("Command returned rc=%d" % rc) if show_output: for line in stdout: logger.debug("STDOUT - %s" % line) for line in stderr: logger.debug("STDERR - %s" % line) if throw_on_error and rc != 0: raise Exception("Command failed, rc=%d" % rc) return rc
Don’t forget to install the library:
sudo pip install paramiko
You can find the full script in my github repository. Feel free using it for your own needs.
How do I use it ?
On my laptop, I keep the latest commit named “DO NOT PUSH – WIP” and after making code changes, I commit them using “git commit –amend” command.
Then I connect to the builder and run the following command:
git fetch && git reset --hard origin/master
That updates my git repository with the changes made on my laptop so I can compile and test whatever I want.
After finishing what I wanted to do, I go back to the laptop and run: “git reset –soft HEAD^” so git removes the “DO NOT PUSH – WIP” commit hash and leave my latest changes as modified files so I can create a real commit that will be pushed to the remote repository we all work with.