Dynamo Session Manager

We (EnergyHub) have just released a session replication plugin for Tomcat 6 using Amazon DynamoDB.

Motivation

We’ve never used the bundled Tomcat clustering solutions, mainly because we run on EC2 and the multicast-based solution doesn’t work there. For about 18 months we’ve been using Memcached Session Manager (m-s-m), which stores the sessions in memcached. It works pretty well, but avoiding a single point of failure in a simple protocol like memcached is hard. m-s-m solves this problem by saving the session to a given memcached node and also a backup server. This has led to high complexity: in our case we counted 6 memcached calls for each web request. In defense of m-s-m, the author is very active on the mailing list and helped us debug a lot of corner case problems, often pushing out a release within a day or two of us reporting a problem. Also we were using the non-sticky configuration which is less well tested.

In the end, the sheer amount of code and the number of steps for each request in m-s-m made it hard to diagnose subtle intermittent bugs. We found David Dawson’s Mongo Tomcat Sessions plugin which very cleanly loads a session from Mongo at the beginning of the request and saves it back at the end. This is smartly putting the replication logic into the database layer. Mongo is great for easy-to-use replication, and we rely on it in production already, but for decent replication we’d be talking about two or three servers, which adds cost and maintenance burden. We figured we could take the general approach but use Amazon’s DynamoDB on the backend: no worries about deploying or monitoring the storage layer.

Implementation

When a request comes in, we look it up by ID in a Dynamo table. If it’s not found we’ll start a new session. After the request a helper Tomcat valve calls back to save the session back to Dynamo. This approach works well, the only thing to consider before rushing to deploy is that Dynamo must be configured for a certain throughput. In our case, the throughput t that must be provisioned is

t = s*r,

where s is the session size rounded up, in kB; and r is the request rate (requests per second). For example, if the vast majority of sessions are 1 < s < 2 kB and we have a maximum request rate of 100 req/s, then we must provision the table for 2*100 = 200 read units and 200 write units.

Session Expiration by table Rotation

Moving to Dynamo loses us one key feature vs MongoDB: secondary indices. In this case, it means we can't have an index on the last modified time of the session, which could be used to delete the expired sessions. We have to workarounds in Dynamo:

  1. Scan the whole table for expired 'last modified time' sessions. This is expensive and hard to provision for: if you have a high session turnover you could have millions of sessions to scan through, but only hundred of provisioned reads per second in which to do so.
  2. Move active sessions to a new table and drop the old one. This is the approach we have taken. For example, if the expiration time is one hour, we will start by saving our sessions into table 'A'. After one hour, we will create a new table, 'B', and start saving new sessions into B. When loading an existing session, we will look in 'B', and if not found, in 'A'. In this manner active sessions will be moved to table B over the next hour. After another hour, we will create table 'C' and start saving there. At this point all sessions that only exist in table 'A' are older than one hour and can be safely dropped, so we delete the whole table.

Extra Features / Advanced Settings

  • We've added optional monitoring via statsd, which we use heavily in production.
  • We auto-provision the read-write capacity when we create tables, based on a given session size and request rate.
  • We've added the ability to ignore requests by URI or based on the presence of certain HTTP headers. This is useful for us because we have a lot of machine-generated traffic that doesn't use sessions.

See the code at https://github.com/werkshy/dynamo-session-manager or use the jars from http://repo1.maven.org/maven2/net/energyhub/dynamo-session-manager/

Running MySQL/InnoDB in-memory for unit tests

I thought I’d take another shot at reducing our build times. When we test our full legacy code, there’s a lot of slow integration tests involving mysql. I looked at using an in-memory database like H2-with-mysql-syntax but some of our code (e.g. table creation) is too mysql-specific. Next step: use a ram disk for Mysql. This is all based on Ubuntu 12.04.

Here’s a script that starts MySQL with the parameters to use /dev/shm for all files, and bootstraps in the root user and system tables. I have verified using iotop and iostat that nothing is written to actual disk with these settings.

As for performance? a full test run of our main data access library has gone from 4:11 to 3:41, about 11% faster. Not much really!

WordPress Development, Staging and Production Deployment

A.K.A Keeping Your WordPress in Git

As a web application developer I’m used to having several environments to deploy to: my local workstation, the QA testing environment and our production environment. I’m also accustomed to keeping everything in version control: code, config and deployment scripts. As we prepare a new release it spends time in the QA environment and when testing is complete we move it to production. The method for deploying to QA is very similar to how we deploy to production, since we want to catch bugs in the deployment process itself.

This technique is not obviously applied to WordPress deployment. Over the years I have developed a technique for hosting a WordPress ‘development’ environment for our marketing and frontend webdev people to work on before it is release to the public. We keep all the changes in git and deploy directly from git in one command. I haven’t seen any other great solutions to the problem that a lot of your content is in the database, but a whole bunch of stuff is also in the theme files (php and js), so you need to ‘deploy’ the database changes alongside the file changes. Here’s my take on that.

Caveat

This technique BLOWS AWAY the production database during deployment. It is therefore not useful if you have comments enabled in WordPress. We use WordPress more like a CMS than a blog so we are free to replace the database when we deploy. The technique could probably be adapted to only deploy the essential tables (pages, posts etc) and leave the comments table alone.

Usage

Let’s assume the development environment is at /var/www/dev and the production environment is at /var/www/prod.

To ‘checkin’ the dev version

cd /var/www/dev
dump-n-push

To ‘checkout’ the current version into production

cd /var/www/prod
deploy

Set Up

Download the scripts from https://github.com/werkshy/wp-deploy and copy them to /usr/local/bin, which should be in your $PATH.

Everything is checked into git: wordpress files, themes, plugins, db dumps, everything.

Install wordpress in the dev environment

Download and unzip the wordpress release at /var/www/dev.

You’ll need to setup the dev database.

mysql -uroot -p
mysql> create database wpdev;
mysql> grant all on wpdev.* to wordpress identified by 'wordpress';

Set the db parameters in wp_config.php. THIS WILL NOT BE CHECKED IN.

Edit .gitignore, most importantly to block wp-connfig.php:

/wp-content/cache/
.DS_Store
/wp-config.php
.htaccess

Set up your webserver to serve php from that directory as normal, (see example Apache configs at the end of this post).

Add Everything To Git


cd /var/www/dev
git init
git add -A
git commit -m "initial commit"

Create the ‘origin’ repository

You may keep your site on a remote git repo, or in a git repo on the local machine.

Create the ‘origin’ repository:

cd /root/
mkdir wp.git
cd wp.git
git init --bare

Push your dev commit to the origin

cd /var/www/dev
dumpdb
git remote add origin /root/wp.git
git push

Prepare the production environment

Checkout the files:

cd /var/www
git clone /root/wp.git prod
cd prod
cp /var/www/dev/wp-config.php .

Create the production database (use the same user as the dev one)

mysql -uroot -p
mysql> create database wpprod
mysql> grant all on wpprod.* to wordpress;

Set the production db name in wp-config.php

Now try loading the db dump into production:

loaddb

If that all works, you can now dump and push the dev site with

dump-n-push

and you can deploy the production site from git with

deploy

Example Apache Config

Development Environment:

<VirtualHost *:80>
	ServerName dev.energyhub.com
	DocumentRoot /var/www/dev
	<Directory "/var/www/dev">
		AllowOverride All
	</Directory>
</VirtualHost>

Production Environment:

<VirtualHost *:80>
	ServerName www.energyhub.com
	DocumentRoot /var/www/prod
	<Directory "/var/www/prod">
		AllowOverride All
	</Directory>
</VirtualHost>

Picflick Update

Here’s Picflick v1.3.
Here’s Picflick v1.3.1.

Here’s Picflick v1.3.2.

Here’s Picflick v1.3.3.

New feature: much simplified single-script setup, getting rid of the picflick_starter wrapper script. The button now calls the picflick script which re-launches itself in a terminal window so you can see the progress. Much easier to understand and configure.

Bug fixes:

  1. The “make install” step failed when the Picasa buttons directory did not already exist. Now fixed. (As reported by Jeff Bloemink).
  2. Only use ‘urxvt’, if available, not rxvt since the latter does not have theĀ  -hold option (thanks again to Jeff Bloemink).
  3. Fixed typo in xterm command line (thanks to “Dr AKULAvitch”)
  4. Fixed bug when using AUTH_TOKEN in picflick script instead of ~/.flickrrc (thanks to Mathieu).

Thanks for the bug reports guys. Keep ‘em coming.

Picflick home page

Claws Mail with GMail

Why Claws Mail?

I’ve been suffering more and more recently on my old Thinkpad maxed out at 1GB of RAM. Also I’ve been feeling the need to use a real mail client after a few
months of having two GMail windows open (work + personal). Trusty old
Thunderbird uses 40+MB of RAM on this machine for three IMAP accounts, using a couple of crucial extensions. 40MB is a large chunk of my precious memory, considering that I’m already using two instances of Firefox (one for browsing, one for web development). If the memory usage hits 1GB then everything grinds for a couple of minutes (swap is so evil on laptops!) until I can kill one of those Firefoxes, so all of my apps have to justify themselves against low memory alternatives. Claws uses about 6MB, so I’m using it for now.

Using Claws Mail with GMail

Claws is remarkably capable as a GMail IMAP client these days. Naturally it supports IMAP over SSL and SMTP over SSL with TLS, which is required for GMail. It also has two features which Thunderbird only supports through extensions or about:config magic settings:

  • You can set Trash to be [Gmail]/Trash.
  • You can set up a shortcut key to archive emails. This isn’t obvious so here’s how:
  1. Create a label in Gmail called ‘archived’. This is just a label where you can put stuff so it isn’t in the inbox (“inbox” in Gmail is just a label too)
  2. Go to Configuration/Actions.
  3. Add a new action with Menu Name “Archive” and command as a filter action.
  4. Edit filter action, set Action = Move, and Destination = archived
  5. Save the action. You should now have the action available in the menu under Tools/Actions/Archive and can check that it works.
  6. Now, to set a shortcut key, go to Configuration/Preferences/Other/Miscellaneous and set “Enable customisable keyboard shortcuts”. Then go to Tools/Actions, and with the “archive” action highlighted press ‘Y’ to set the keyboard shortcut.

Other settings:

Tell Claws not to save sent mail, because using Google’s SMTP puts a copy in your sent folder anyway.

Set [Gmail]/Sent Mail to type ‘outbox’ and you can delete the other ‘Sent’ folder, plus you get a nice icon on the sent mail folder. You can do the same with Drafts and Trash.

Broken en_ZA locale in Ubuntu Jaunty

I’m dealing with a lot of documents with the language set to English (South African) this year, and in OpenOffice on Jaunty there’s always a ton of perfectly cromulent words which being flagged as mispelled. On the command line I see an error like this:

Failure loading aff file /usr/share/myspell/dicts/en_ZA.aff

I do have all the relevant packages installed, so it seems like Jaunty has installed a affixes file for myspell that can’t actually be used by OpenOffice at least.

The fix is to download the myspell en_ZA files from here http://downloads.translate.org.za/spellchecker/

Backup the original files

cd /usr/share/myspell/dicts/

sudo mv en_ZA.aff en_ZA.aff.bak

Unzip the file from translate.org.za, and copy it to /usr/share/myspell/dicts/en_ZA.aff, and do the same for en_ZA.dic

That should give you a working spellchecker in OpenOffice.org