I miss having Vim around when I’m trying to develop a one-off script to run on Heroku. I found this plugin that installs vim but I thought it’d be great to just throw in busybox from the official binaries page. Here’s a modification to the vim plugin that does just that:
I just posted on the Harry’s engingeering blog about a mock gateway for Authorize.net payment APIs for use in Rspec unit tests and development Rails environments. We found that this sped up our test run at least 2X compared to using the Authorized.net sandbox server.
Another post over on the Harry’s engineering blog about testing asynchronous MooTools code in Jasmine.
I just released (as part of my work at Harry’s) an autoscaler for Heroku that uses Google Analytics realtime data to scale a Heroku app up or down.
More info in my post on the Harry’s engineering blog.
We’ve never used the bundled Tomcat clustering solutions, mainly because we run on EC2 and the multicast-based solution doesn’t work there. For about 18 months we’ve been using Memcached Session Manager (m-s-m), which stores the sessions in memcached. It works pretty well, but avoiding a single point of failure in a simple protocol like memcached is hard. m-s-m solves this problem by saving the session to a given memcached node and also a backup server. This has led to high complexity: in our case we counted 6 memcached calls for each web request. In defense of m-s-m, the author is very active on the mailing list and helped us debug a lot of corner case problems, often pushing out a release within a day or two of us reporting a problem. Also we were using the non-sticky configuration which is less well tested.
In the end, the sheer amount of code and the number of steps for each request in m-s-m made it hard to diagnose subtle intermittent bugs. We found David Dawson’s Mongo Tomcat Sessions plugin which very cleanly loads a session from Mongo at the beginning of the request and saves it back at the end. This is smartly putting the replication logic into the database layer. Mongo is great for easy-to-use replication, and we rely on it in production already, but for decent replication we’d be talking about two or three servers, which adds cost and maintenance burden. We figured we could take the general approach but use Amazon’s DynamoDB on the backend: no worries about deploying or monitoring the storage layer.
When a request comes in, we look it up by ID in a Dynamo table. If it’s not found we’ll start a new session. After the request a helper Tomcat valve calls back to save the session back to Dynamo. This approach works well, the only thing to consider before rushing to deploy is that Dynamo must be configured for a certain throughput. In our case, the throughput
t that must be provisioned is
t = s*r,
s is the session size rounded up, in kB; and r is the request rate (requests per second). For example, if the vast majority of sessions are
1 < s < 2 kB and we have a maximum request rate of 100 req/s, then we must provision the table for
2*100 = 200 read units and 200 write units.
Session Expiration by table Rotation
Moving to Dynamo loses us one key feature vs MongoDB: secondary indices. In this case, it means we can't have an index on the last modified time of the session, which could be used to delete the expired sessions. We have to workarounds in Dynamo:
- Scan the whole table for expired 'last modified time' sessions. This is expensive and hard to provision for: if you have a high session turnover you could have millions of sessions to scan through, but only hundred of provisioned reads per second in which to do so.
- Move active sessions to a new table and drop the old one. This is the approach we have taken. For example, if the expiration time is one hour, we will start by saving our sessions into table 'A'. After one hour, we will create a new table, 'B', and start saving new sessions into B. When loading an existing session, we will look in 'B', and if not found, in 'A'. In this manner active sessions will be moved to table B over the next hour. After another hour, we will create table 'C' and start saving there. At this point all sessions that only exist in table 'A' are older than one hour and can be safely dropped, so we delete the whole table.
Extra Features / Advanced Settings
- We've added optional monitoring via statsd, which we use heavily in production.
- We auto-provision the read-write capacity when we create tables, based on a given session size and request rate.
- We've added the ability to ignore requests by URI or based on the presence of certain HTTP headers. This is useful for us because we have a lot of machine-generated traffic that doesn't use sessions.
See the code at https://github.com/werkshy/dynamo-session-manager or use the jars from http://repo1.maven.org/maven2/net/energyhub/dynamo-session-manager/
I thought I’d take another shot at reducing our build times. When we test our full legacy code, there’s a lot of slow integration tests involving mysql. I looked at using an in-memory database like H2-with-mysql-syntax but some of our code (e.g. table creation) is too mysql-specific. Next step: use a ram disk for Mysql. This is all based on Ubuntu 12.04.
Here’s a script that starts MySQL with the parameters to use /dev/shm for all files, and bootstraps in the root user and system tables. I have verified using iotop and iostat that nothing is written to actual disk with these settings.
As for performance? a full test run of our main data access library has gone from 4:11 to 3:41, about 11% faster. Not much really!
A.K.A Keeping Your WordPress in Git
As a web application developer I’m used to having several environments to deploy to: my local workstation, the QA testing environment and our production environment. I’m also accustomed to keeping everything in version control: code, config and deployment scripts. As we prepare a new release it spends time in the QA environment and when testing is complete we move it to production. The method for deploying to QA is very similar to how we deploy to production, since we want to catch bugs in the deployment process itself.
This technique is not obviously applied to WordPress deployment. Over the years I have developed a technique for hosting a WordPress ‘development’ environment for our marketing and frontend webdev people to work on before it is release to the public. We keep all the changes in git and deploy directly from git in one command. I haven’t seen any other great solutions to the problem that a lot of your content is in the database, but a whole bunch of stuff is also in the theme files (php and js), so you need to ‘deploy’ the database changes alongside the file changes. Here’s my take on that.
This technique BLOWS AWAY the production database during deployment. It is therefore not useful if you have comments enabled in WordPress. We use WordPress more like a CMS than a blog so we are free to replace the database when we deploy. The technique could probably be adapted to only deploy the essential tables (pages, posts etc) and leave the comments table alone.
Let’s assume the development environment is at /var/www/dev and the production environment is at /var/www/prod.
To ‘checkin’ the dev version
To ‘checkout’ the current version into production
Download the scripts from https://github.com/werkshy/wp-deploy and copy them to /usr/local/bin, which should be in your $PATH.
Everything is checked into git: wordpress files, themes, plugins, db dumps, everything.
Install wordpress in the dev environment
Download and unzip the wordpress release at /var/www/dev.
You’ll need to setup the dev database.
mysql -uroot -p
mysql> create database wpdev;
mysql> grant all on wpdev.* to wordpress identified by 'wordpress';
Set the db parameters in wp_config.php. THIS WILL NOT BE CHECKED IN.
Edit .gitignore, most importantly to block wp-connfig.php:
Set up your webserver to serve php from that directory as normal, (see example Apache configs at the end of this post).
Add Everything To Git
git add -A
git commit -m "initial commit"
Create the ‘origin’ repository
You may keep your site on a remote git repo, or in a git repo on the local machine.
Create the ‘origin’ repository:
git init --bare
Push your dev commit to the origin
git remote add origin /root/wp.git
Prepare the production environment
Checkout the files:
git clone /root/wp.git prod
cp /var/www/dev/wp-config.php .
Create the production database (use the same user as the dev one)
mysql -uroot -p
mysql> create database wpprod
mysql> grant all on wpprod.* to wordpress;
Set the production db name in wp-config.php
Now try loading the db dump into production:
If that all works, you can now dump and push the dev site with
and you can deploy the production site from git with
Example Apache Config
<VirtualHost *:80> ServerName dev.energyhub.com DocumentRoot /var/www/dev <Directory "/var/www/dev"> AllowOverride All </Directory> </VirtualHost>
<VirtualHost *:80> ServerName www.energyhub.com DocumentRoot /var/www/prod <Directory "/var/www/prod"> AllowOverride All </Directory> </VirtualHost>
I just released ‘sleeper’, a little utility script to suspend your computer if you are running a lightweight window manager like Awesome or Xmonad.
Here’s Picflick v1.3.3.
New feature: much simplified single-script setup, getting rid of the picflick_starter wrapper script. The button now calls the picflick script which re-launches itself in a terminal window so you can see the progress. Much easier to understand and configure.
- The “make install” step failed when the Picasa buttons directory did not already exist. Now fixed. (As reported by Jeff Bloemink).
- Only use ‘urxvt’, if available, not rxvt since the latter does not have the -hold option (thanks again to Jeff Bloemink).
- Fixed typo in xterm command line (thanks to “Dr AKULAvitch”)
- Fixed bug when using AUTH_TOKEN in picflick script instead of ~/.flickrrc (thanks to Mathieu).
Thanks for the bug reports guys. Keep ‘em coming.
Why Claws Mail?
I’ve been suffering more and more recently on my old Thinkpad maxed out at 1GB of RAM. Also I’ve been feeling the need to use a real mail client after a few
months of having two GMail windows open (work + personal). Trusty old
Thunderbird uses 40+MB of RAM on this machine for three IMAP accounts, using a couple of crucial extensions. 40MB is a large chunk of my precious memory, considering that I’m already using two instances of Firefox (one for browsing, one for web development). If the memory usage hits 1GB then everything grinds for a couple of minutes (swap is so evil on laptops!) until I can kill one of those Firefoxes, so all of my apps have to justify themselves against low memory alternatives. Claws uses about 6MB, so I’m using it for now.
Using Claws Mail with GMail
Claws is remarkably capable as a GMail IMAP client these days. Naturally it supports IMAP over SSL and SMTP over SSL with TLS, which is required for GMail. It also has two features which Thunderbird only supports through extensions or about:config magic settings:
- You can set Trash to be [Gmail]/Trash.
- You can set up a shortcut key to archive emails. This isn’t obvious so here’s how:
- Create a label in Gmail called ‘archived’. This is just a label where you can put stuff so it isn’t in the inbox (“inbox” in Gmail is just a label too)
- Go to Configuration/Actions.
- Add a new action with Menu Name “Archive” and command as a filter action.
- Edit filter action, set Action = Move, and Destination = archived
- Save the action. You should now have the action available in the menu under Tools/Actions/Archive and can check that it works.
- Now, to set a shortcut key, go to Configuration/Preferences/Other/Miscellaneous and set “Enable customisable keyboard shortcuts”. Then go to Tools/Actions, and with the “archive” action highlighted press ‘Y’ to set the keyboard shortcut.
Tell Claws not to save sent mail, because using Google’s SMTP puts a copy in your sent folder anyway.
Set [Gmail]/Sent Mail to type ‘outbox’ and you can delete the other ‘Sent’ folder, plus you get a nice icon on the sent mail folder. You can do the same with Drafts and Trash.