I had a few problems getting up and running with docker-based gitlab-ci builds, so here’s a description of my setup. I’m using gitlab.com (not self-hosted) and their hosted CI at ci.gitlab.com BUT I am using private build runners on my own Ubuntu 14.04 server.
This is a bit vague because I did this a few weeks ago but here’s what I remember
- Install gitlab-ci-multi-runner from the apt repo (Instructions)
- Install docker using the same instructions
- Enter your gitlab-ci URL and the key from the /runners page in gitlab-ci
- If you have execjs in your Gemfile.lock, you should specify your docker image as
mwallasch/docker-ruby-node. Otherwise you can use
- You can add tags on a runner via the web settings in CI. I couldn’t find how to do it from the config file.
- Speaking of config files, the multi-runner is installed as root so look at
/etc/gitlab-runner/config.toml. My final one looks kind of like this:
concurrent = 1
name = "docker-runner-1"
url = "https://ci.gitlab.com/"
token = "XXXXXXXXXXXXXXXXXXXXXX"
limit = 1
executor = "docker"
image = "mwallasch/docker-ruby-node"
privileged = false
volumes = ["/cache"]
services = ["postgres:latest", "redis:latest"]
See that volume? The default config wizard added that, we’ll use it later.
- Add .gitliab-ci.yml in your rails project.
# Run before each script
- gem install bundler
- touch log/application.log
- touch log/test.log
- bundle install --jobs $(nproc) --path=/cache/bundler
- cp config/database.gitlab-ci.yml config/database.yml
- "bundle exec rake db:create RAILS_ENV=test"
- "RAILS_ENV=test bundle exec rake db:reset"
script: "bundle exec rspec"
- I added some tags here and setup my runner with matching tags in the CI web config. Note that I have a specific
database.yml just for the CI environment, since we’re using docker services for postgres and redis (more below)
--jobs $(nproc) to speed up bundle installation.
--path=/cache/bundler which puts the gems on the persistent cache volume configured in
- Set up the database config. In config/database.gitlab-ci.yml:
Took some pain to get here but I do like that I can own the hardware the builds run on.
In my typical Linux desktop I use ‘xbindkeys’ or my window manager’s built in keyboard shortcut preferences to set up an ‘open new terminal window’ keystroke (Super-Enter usually). When I moved to OS X for work last year, this was something I really missed, and it took me a while to find the simplest way to achieve it.
The solution is some simple applescript, which I put in ~/Library/Scripts/new-terminal.applescript
tell application "Terminal"
do script ""
Then I installed the utility ‘FastScripts‘, at which point you can easily go into preferences and set up a global keyboard shortcut for your script.
Here’s another applescript I use to
- Collect user input (name of server to ssh into in this case)
- Run a utility bash script and pass in the argument
display dialog "Server name" default answer ""
set server_name to text returned of result
do shell script "ITERM_PROFILE=Default bash -l connect " & server_name
Here’s a little Makefile I use with the Rails Pipeline we’re building at Harry’s. It builds .proto ProtocolBuffer definitions into .pb.rb Ruby classes using the ruby-protoc compiler.
To use, you would have ruby-protoc in your Gemfile. Set the GENDIR variable to point at the directory with your .proto files, then run
Here’s a little python class for interacting with the statsd admin interface. I wrote it for EnergyHub, where we used it alongside the nifty pystatsd module to clean up obsolete stats as we rotated our servers. We can delete the obsolete stats, in order to prevent gauges, for example, from reporting their last value until statsd restarts.
It implements the admin commands to list stats:
gauges; and the management commands:
I miss having Vim around when I’m trying to develop a one-off script to run on Heroku. I found this plugin that installs vim but I thought it’d be great to just throw in busybox from the official binaries page. Here’s a modification to the vim plugin that does just that:
I just posted on the Harry’s engingeering blog about a mock gateway for Authorize.net payment APIs for use in Rspec unit tests and development Rails environments. We found that this sped up our test run at least 2X compared to using the Authorized.net sandbox server.
Another post over on the Harry’s engineering blog about testing asynchronous MooTools code in Jasmine.
I just released (as part of my work at Harry’s) an autoscaler for Heroku that uses Google Analytics realtime data to scale a Heroku app up or down.
Dynosaur: the core functionality in a gem with command line client.
Dynosaur Rails: packaged into an easy-to-deploy Rails app with web interface for configuration and status.
More info in my post on the Harry’s engineering blog.
We (EnergyHub) have just released a session replication plugin for Tomcat 6 using Amazon DynamoDB.
We’ve never used the bundled Tomcat clustering solutions, mainly because we run on EC2 and the multicast-based solution doesn’t work there. For about 18 months we’ve been using Memcached Session Manager (m-s-m), which stores the sessions in memcached. It works pretty well, but avoiding a single point of failure in a simple protocol like memcached is hard. m-s-m solves this problem by saving the session to a given memcached node and also a backup server. This has led to high complexity: in our case we counted 6 memcached calls for each web request. In defense of m-s-m, the author is very active on the mailing list and helped us debug a lot of corner case problems, often pushing out a release within a day or two of us reporting a problem. Also we were using the non-sticky configuration which is less well tested.
In the end, the sheer amount of code and the number of steps for each request in m-s-m made it hard to diagnose subtle intermittent bugs. We found David Dawson’s Mongo Tomcat Sessions plugin which very cleanly loads a session from Mongo at the beginning of the request and saves it back at the end. This is smartly putting the replication logic into the database layer. Mongo is great for easy-to-use replication, and we rely on it in production already, but for decent replication we’d be talking about two or three servers, which adds cost and maintenance burden. We figured we could take the general approach but use Amazon’s DynamoDB on the backend: no worries about deploying or monitoring the storage layer.
When a request comes in, we look it up by ID in a Dynamo table. If it’s not found we’ll start a new session. After the request a helper Tomcat valve calls back to save the session back to Dynamo. This approach works well, the only thing to consider before rushing to deploy is that Dynamo must be configured for a certain throughput. In our case, the throughput
t that must be provisioned is
t = s*r,
s is the session size rounded up, in kB; and r is the request rate (requests per second). For example, if the vast majority of sessions are
1 < s < 2 kB and we have a maximum request rate of 100 req/s, then we must provision the table for
2*100 = 200 read units and 200 write units.
Session Expiration by table Rotation
Moving to Dynamo loses us one key feature vs MongoDB: secondary indices. In this case, it means we can't have an index on the last modified time of the session, which could be used to delete the expired sessions. We have to workarounds in Dynamo:
- Scan the whole table for expired 'last modified time' sessions. This is expensive and hard to provision for: if you have a high session turnover you could have millions of sessions to scan through, but only hundred of provisioned reads per second in which to do so.
- Move active sessions to a new table and drop the old one. This is the approach we have taken. For example, if the expiration time is one hour, we will start by saving our sessions into table 'A'. After one hour, we will create a new table, 'B', and start saving new sessions into B. When loading an existing session, we will look in 'B', and if not found, in 'A'. In this manner active sessions will be moved to table B over the next hour. After another hour, we will create table 'C' and start saving there. At this point all sessions that only exist in table 'A' are older than one hour and can be safely dropped, so we delete the whole table.
Extra Features / Advanced Settings
- We've added optional monitoring via statsd, which we use heavily in production.
- We auto-provision the read-write capacity when we create tables, based on a given session size and request rate.
- We've added the ability to ignore requests by URI or based on the presence of certain HTTP headers. This is useful for us because we have a lot of machine-generated traffic that doesn't use sessions.
See the code at https://github.com/werkshy/dynamo-session-manager or use the jars from http://repo1.maven.org/maven2/net/energyhub/dynamo-session-manager/
I thought I’d take another shot at reducing our build times. When we test our full legacy code, there’s a lot of slow integration tests involving mysql. I looked at using an in-memory database like H2-with-mysql-syntax but some of our code (e.g. table creation) is too mysql-specific. Next step: use a ram disk for Mysql. This is all based on Ubuntu 12.04.
Here’s a script that starts MySQL with the parameters to use /dev/shm for all files, and bootstraps in the root user and system tables. I have verified using iotop and iostat that nothing is written to actual disk with these settings.
As for performance? a full test run of our main data access library has gone from 4:11 to 3:41, about 11% faster. Not much really!