We’ve never used the bundled Tomcat clustering solutions, mainly because we run on EC2 and the multicast-based solution doesn’t work there. For about 18 months we’ve been using Memcached Session Manager (m-s-m), which stores the sessions in memcached. It works pretty well, but avoiding a single point of failure in a simple protocol like memcached is hard. m-s-m solves this problem by saving the session to a given memcached node and also a backup server. This has led to high complexity: in our case we counted 6 memcached calls for each web request. In defense of m-s-m, the author is very active on the mailing list and helped us debug a lot of corner case problems, often pushing out a release within a day or two of us reporting a problem. Also we were using the non-sticky configuration which is less well tested.
In the end, the sheer amount of code and the number of steps for each request in m-s-m made it hard to diagnose subtle intermittent bugs. We found David Dawson’s Mongo Tomcat Sessions plugin which very cleanly loads a session from Mongo at the beginning of the request and saves it back at the end. This is smartly putting the replication logic into the database layer. Mongo is great for easy-to-use replication, and we rely on it in production already, but for decent replication we’d be talking about two or three servers, which adds cost and maintenance burden. We figured we could take the general approach but use Amazon’s DynamoDB on the backend: no worries about deploying or monitoring the storage layer.
When a request comes in, we look it up by ID in a Dynamo table. If it’s not found we’ll start a new session. After the request a helper Tomcat valve calls back to save the session back to Dynamo. This approach works well, the only thing to consider before rushing to deploy is that Dynamo must be configured for a certain throughput. In our case, the throughput
t that must be provisioned is
t = s*r,
s is the session size rounded up, in kB; and r is the request rate (requests per second). For example, if the vast majority of sessions are
1 < s < 2 kB and we have a maximum request rate of 100 req/s, then we must provision the table for
2*100 = 200 read units and 200 write units.
Session Expiration by table Rotation
Moving to Dynamo loses us one key feature vs MongoDB: secondary indices. In this case, it means we can't have an index on the last modified time of the session, which could be used to delete the expired sessions. We have to workarounds in Dynamo:
- Scan the whole table for expired 'last modified time' sessions. This is expensive and hard to provision for: if you have a high session turnover you could have millions of sessions to scan through, but only hundred of provisioned reads per second in which to do so.
- Move active sessions to a new table and drop the old one. This is the approach we have taken. For example, if the expiration time is one hour, we will start by saving our sessions into table 'A'. After one hour, we will create a new table, 'B', and start saving new sessions into B. When loading an existing session, we will look in 'B', and if not found, in 'A'. In this manner active sessions will be moved to table B over the next hour. After another hour, we will create table 'C' and start saving there. At this point all sessions that only exist in table 'A' are older than one hour and can be safely dropped, so we delete the whole table.
Extra Features / Advanced Settings
- We've added optional monitoring via statsd, which we use heavily in production.
- We auto-provision the read-write capacity when we create tables, based on a given session size and request rate.
- We've added the ability to ignore requests by URI or based on the presence of certain HTTP headers. This is useful for us because we have a lot of machine-generated traffic that doesn't use sessions.
See the code at https://github.com/werkshy/dynamo-session-manager or use the jars from http://repo1.maven.org/maven2/net/energyhub/dynamo-session-manager/