Easier Production Releases
December 6th, 2007Get rid of the 2:00 AM releases using Apache and mod_proxy_balancer.
I’ve been a part of some late night release procedures and they’re never fun. You’ve got QA, Dev, IT and a handful of managers sitting in their jammies in a group IM (or worse, a conference call) from 2:00 AM until way too early in the morning. Everyone’s grumpy and sleepy, causing the release to be more difficult and take longer. Sometimes the dreaded “rollback!” is yelled. All this because you’re running a high profile website that needs to be accessible 24/7, and 2:00 AM - 5:00 AM downtime is better than daytime downtime.
For this example we’ll have a 4 server production cluster, each server running Tomcat. We’ll call them web-1.example.com through web-4.example.com.
(Please note that these configuration examples are from Apache/2.2.3)
So the downside here is you need twice as many frontend servers. You need to create two separate fully built out clusters. Instead of web-1 through web-4 we now have web-1.prod-a through web-4.prod-a and web-1.prod-b through web-2.prod-b.
Now we set up the prod-a and prod-b clusters in Apache’s httpd.conf:
<VirtualHost *:80> ServerName prod-a.example.com ProxyPass / balancer://web-prod-a/ ProxyPassReverse / balancer://web-prod-a/ ProxyPreserveHost On <Proxy balancer://web-prod-a> BalancerMember ajp://web-1.prod-a.example.com:8009 BalancerMember ajp://web-2.prod-a.example.com:8009 BalancerMember ajp://web-3.prod-a.example.com:8009 BalancerMember ajp://web-4.prod-a.example.com:8009 Order Allow,Deny Allow from all </Proxy> </VirtualHost> <VirtualHost *:80> ServerName prod-b.example.com ProxyPass / balancer://web-prod-b/ ProxyPassReverse / balancer://web-prod-b/ ProxyPreserveHost On <Proxy balancer://web-prod-b> BalancerMember ajp://web-1.prod-b.example.com:8009 BalancerMember ajp://web-2.prod-b.example.com:8009 BalancerMember ajp://web-3.prod-b.example.com:8009 BalancerMember ajp://web-4.prod-b.example.com:8009 Order Allow,Deny Allow from all </Proxy> </VirtualHost>
Then you need to create internal dns entries that point to these two specific hosts. prod-a.example.com and prod-b.example.com. Now you should have two separate but identical production clusters, that are accessible internally only. Once you have both clusters working independently, you need to create the public cluster.
<VirtualHost *:80> ServerName example.com ServerAlias www.example.com ProxyPass / balancer://web-prod/ ProxyPassReverse / balancer://web-prod/ ProxyPreserveHost On <Proxy balancer://web-prod> BalancerMember ajp://web-1.prod-a.example.com:8009 BalancerMember ajp://web-2.prod-a.example.com:8009 BalancerMember ajp://web-3.prod-a.example.com:8009 BalancerMember ajp://web-4.prod-a.example.com:8009 BalancerMember ajp://web-1.prod-b.example.com:8009 BalancerMember ajp://web-2.prod-b.example.com:8009 BalancerMember ajp://web-3.prod-b.example.com:8009 BalancerMember ajp://web-4.prod-b.example.com:8009 Order Allow,Deny Allow from all </Proxy> </VirtualHost>
Set up an external dns entry for example.com and www.example.com (and the appropriate firewall rules to get port 80 of that IP to this apache box) and you now have a website that load balances over two separate clusters. You’re ready for a release during BUSINESS HOURS!
Choose a cluster to release to first and comment out all of its servers in the web-prod balancer. Run an apachectl graceful and you should then have no traffic going to that cluster. Shutdown the cluster, do your release thing, start up the cluster and then send QA off to test the cluster using the specific prod-a or prod-b.example.com url. Once QA confirms it’s good, uncomment the cluster from the web-prod balancer and comment out the other one. Fire off another graceful restart and repeat the release procedure on that cluster.
And *poof* you just rolled out a release with no downtime and very little impact to end users! Of course, this was just a very simple example. You can get much more detailed in your cluster design, especially with JBoss and different components. Just remember your individual clusters have to be completely separate. They can’t share any resources. Although, in all my experience they’ve always shared a database. You just need to make certain that any database updates the devs require you to make for the upgrade won’t cause the older release any harm.
This concept should work in most situations, just could take more planning/design/set up. I’ve got a similar set up running with 36 JBoss servers, 18 in each cluster. Quite a bit more had to be configured, especially to make sure that each individual cluster acts independently, and then there’s the SSL issues and sticky session requirements. But it’s so worth it though, and everyone will love you for cutting out the need for past midnight release “parties”.

















10 Responses to “Easier Production Releases”
By Marcin on Dec 6, 2007 | Reply
This is pretty neat but one issue is that you’re still potentially impacting users. For example say a user is half way through a checkout process, they can finish the current request but not go through the rest of their process.
Not sure about the implementation, as I don’t deal with these kinds of environments, but you’d want a way to prevent people starting new sessions. It’s probably something you could do at the application level, with some sort of message for new sessions. Then wait for a reasonable amount of time for users to “finish their business” before taking the cluster down. You could have some sort of message appear on each page giving them some warning.
Anyway, just some thoughts!
By Ryan Hadley on Dec 6, 2007 | Reply
Yeah, that’s why I avoided saying no downtime/no impact.
But it is a very good point. Doing an apachectl graceful will always interrupt someone. So it depends on what kind of service your app provides if this is acceptable. I’ve never ran an app that important that I couldn’t bump some users for a second or two. But I’m sure there are some out there.
By Callum on Dec 6, 2007 | Reply
Doesn’t 2 x 18 = 36?
By Ryan Hadley on Dec 6, 2007 | Reply
heh. So it does. *fixes*
By Michael S. Moody on Dec 6, 2007 | Reply
Wouldn’t it be easier to set up an lvs box? http://www.linuxvirtualserver.org
We use that, and a Pentium-3 based machine will easily load balancers 100’s of mbits of traffic, using the LVS-DR or LVS-TUN method. No need to drop $10k on a hardware load balancer (most of which just use ipvs code anyway), just simply set one of these up on anything above a P3, with gigabit ethernet cards, and you’ll have everything you need.
Michael
By Ryan Hadley on Dec 6, 2007 | Reply
That does look very cool, thank you for the link. Whether or not it’d be easier to set up, I don’t know since I’ve never heard of it before now. But setting up an apache with mod_proxy_balancer is by no means hard.
I’m definitely going to play with lvs though!
By Venkat on Dec 10, 2007 | Reply
Assume during release I have a few database schema changes. How this solution will work?
By Ryan Hadley on Dec 10, 2007 | Reply
Some database schema changes can definitely ruin this entire solution. As long as the database schema changes don’t cause the old version of the software to blow up/malfunction/or some other kind of unwanted result, you can still go ahead with this solution. Like, for instance, adding a new table. Small updates to the db like this would be fine.
Obviously calling an alter table on a really large table in a MySQL database will cause downtime for that table. I “love” that MySQL will treat all alter tables as create new temp table, copy the entire table to temp table, drop old table, rename temp table.
If your MySQL database change is of this sort. .. Let’s say increasing a varchar(50) to a varchar(100) on a gigantic table. Then the issue isn’t if the old software will function properly afterwards, it’s how do you deal with the table locks while it copies this gigantic table.
My solution is MySQL replication. We have two database servers, one a master the other a (usually) read-only slave. You do the alter table on the read-only slave, wait for it to finish and for replication to catch up. Then you need to take 5 minutes downtime. flush tables with read lock on the master database and wait for all transactions to finish and replication to halt. Switch server rolls (writing a script to handle this is suggested, for speed and accuracy). Now your read-only slave with the alter table completed is the master database. Make sure your software and replication is happy, and finally run the alter table on the new read-only slave/used to be master database.
By Dustin Puryear on Dec 20, 2007 | Reply
Nifty. mod_proxy can do some nifty things, and can’t you use it along with some custom maps and other tweaks to do some smart balancing? I think you can call an external script to determine the next device (say, if you wanted to evaluate load). Maybe that’s another module..
Pound is a not-so-smart-but-effective load-sharing tool.
Yes, LVS is a good move. I actually did an 8 hour tutorial on it at a past USENIX. The slides are here, somewhere:
http://www.puryear-it.com/pubs/articles
Also, some cheaper load-balancers are available. I’ve had good results with CoyotePoint, although there were some issues with load-balancing DNS at the time.
–
Dustin Puryear
Author, Best Practices for Managing Linux and UNIX Servers
http://www.puryear-it.com/pubs/linux-unix-best-practices
By Mike C on Feb 7, 2008 | Reply
Ryan,
I’d really like to see an example of this script. Can you go into more details? I have this exact situation arising shortly.
Mike