19

We have a huge ASP.NET web application which needs to be deployed to LIVE with zero or nearly zero downtime. Let me point out that I've read the following question/answers but unfortunately it doesn't solve our problems as our architecture is a little bit more complicated.

Let's say that currently we have two IIS servers responding to requests and both are connected to the same MSSQL server. The solution seems like a piece of cake but it isn't because of the major schema changes we have to apply from time to time. Because of it's huge size, a simple database backup takes around 8 minutes which has become unacceptable, but it is a must before every new deploy for security reasons.

I would like to ask your help to get this deployment time down as much as possible. If you have any great ideas for a different architecture or maybe you've used tools which can help us here then please do not be shy and share the info.

Currently the best idea we came up is buying another SQL server which would be set up as a replica of the original DB. From the load balancer we would route all new traffic to one of the two IIS webservers. When the second webserver is free of running sessions then we can make deploy the new code. Now comes the hard part. At this point we would go offline with the website, take down the replication between the two SQL servers so we directly have a snapshot of the database in a hopefully consistent state (saves us 7.5 of the 8 minutes). Finally we would update the database schema on the main SQL server, and route all traffic via the updated webserver while we are upgrading the second webserver to the new version.

Please also share your thoughts regarding this solution. Can we somehow manage to eliminate the need for going offline with the website? How do bluechip companies with mammuth web applications do deployment?

Every idea or suggestion is more than welcome! Buying new hardware or software is really not a problem - we just miss the breaking idea. Thanks in advance for your help!

Edit 1 (2010.01.12):
Another requirement is to eliminate manual intervention, so in fact we are looking for a way which can be applied in an automated way.

Let me just remind you the requirement list:
1. Backup of database
2a. Deploy of website
2b. Update of database schema
3. Change to updated website
4 (optional): easy way of reverting to the old website if something goes very wrong.

Community
  • 1
  • 1
Skorpioh
  • 1,355
  • 1
  • 11
  • 30
  • How can you be free of sessions in a zero downtime environment? – gbn Jan 05 '11 at 12:18
  • What kind of asp.net application is it? – Pauli Østerø Jan 05 '11 at 12:18
  • 10
    "Because of it's huge size, a simple database backup takes around 8 minutes" - that's not huge! – Mitch Wheat Jan 05 '11 at 12:39
  • @gbn: If with free of session you mean that the sessions are not reset-ed when we deploy then this is true as we don't use inproc sessions management. Either state server or db state management solves this issue. In these two cases recycling the application pool also doesn't affect session management. – Skorpioh Jan 05 '11 at 12:40
  • @Pauli Østerø: Well, mainly it's a huge webshop with let's say 1 million unique visitors/day. From the technical point of view it's written in ASP.NET with a C# backend. This is why we cannot afford to drop the site even for 8-10 minutes as there may be like 10k active sessions at any time => lost sales => upset management/stakeholders ... – Skorpioh Jan 05 '11 at 12:44
  • 1
    @Mitch Wheat: I think this is a point of view question. Of course it could be 1 hour / day so 8 mins is not huge, but compared to 0 it's infinitely more ;) – Skorpioh Jan 05 '11 at 12:46
  • 4
    "I think this is a point of view question" - Nope. If your Database can be backed up in 8 minutes, it's not huge by any definition I'm aware of. – Mitch Wheat Jan 05 '11 at 12:56
  • partial backups? maybe like 1/7th which would be like 1.2 minute at 3am might work. Though we use the option where we actually have multiple servers, but it seems as one, though I think sessions are still lost, and you would have to log back in, but we do this when traffic is really low. Our partial backup take 8 hours. – Spooks Jan 05 '11 at 13:32
  • I'm not sure why you list 'backup of database' as a requirement for this question. You should be backing it up anyway, shouldn't you? And unless your update takes strictly 0 time, your backup will be out of date anyway, if users can continue to use your site all the time... And I'm not sure what you mean by 'eliminate manual intervention', that sounds a bit like 'requires no work, just new hardware'. I fear that's not going to happen :) – Benjol Jan 18 '11 at 09:34

9 Answers9

8

First off, you are likely unaware of the "point in time restore" concept. The long and short of it is that if you're properly backing up your transaction logs, it doesn't matter how long your backups take -- you always have the ability to restore back to any point in time. You just restore your last backup and reapply the transaction logs since then, and you can get a restore right up to the point of deployment.

What I would tend to recommend would be reinstalling the website on a different Web Site definition with a "dead" host header configured -- this is your staging site. Make a script which runs your db changes all at once (in a transaction) and then flips the host headers between the live site and the staging site.

Dave Markle
  • 95,573
  • 20
  • 147
  • 170
  • Thanks for your response. We're of course avare of the point in time restore recovery model, which by the way is really a bless when something unexpected and irreversible happens to the database. Still I don't see how this could help me at backing up the database and deploying with zero downtime. I still don't know what happens to the session state database holding the active sessions and especially what happens to the POST requests when we modify for example pages having forms. Will the viewstate survive these changes or will it fail? – Skorpioh Jan 17 '11 at 17:58
  • If you're adding new Controls to these pages then viewstate won't survive anyway. Anyone who attempts to reload pages with additional new Controls will get errors saying their viewstate is corrupted. Not that this technically counts as "downtime", but depending on your user requirements this could be a major inconvenience. – E. Rodriguez Jan 18 '11 at 21:49
4

Environment:

  1. Current live web site(s)
  2. Current live database
  3. New version of web site(s)
  4. New version of database

Approach:

  1. Setup a feed (e.g. replication, a stored procedure etc.) so that the current live database server is sending data updates to the new version of the database.
  2. Change your router so that the new requests get pointed to the new version of the website until the old sites are no longer serving requests.
  3. Take down the old site and database.

In this approach there is zero downtime because both the old site and the new site (and their respective databases) are permitted to serve requests side-by-side. The only problem scenario is clients who have one request go to the new server and a subsequent request go to the old server. In that scenario, they will not see the new data that might have been created on the new site. A solution to that is to configure your router to temporarily use sticky sessions and ensure that new sessions all go to the new web server.

Thomas
  • 63,911
  • 12
  • 95
  • 141
3

One possibility would be to use versioning in your database.

So you have a global setting which defines the current version of all stored procedures to use.

When you come to do a release you do the following:
1. Change database schema, ensuring no stored procedures of the previous 
   version are broken.
2. Release the next version of stored procedures
3. Change the global setting, which switches the application to use the 
   next set of stored procedures/new schema.

The tricky portion is ensuring you don't break anything when you change the database schema.

If you need to make fundamental changes, you'll need to either use 'temporary' tables, which are used for one version, before moving to the schema you want in the next version, or you can modify the previous versions stored procedures to be more flexible.

That should mean almost zero downtime, if you can get it right.

Bravax
  • 10,453
  • 7
  • 40
  • 68
  • Thanks for your response. Unfortunately this implies manual intervention but we are seeking for a way to get this done in an automated way. – Skorpioh Jan 12 '11 at 11:32
  • 1
    Sorry, where is the manual intervention? This should be completely scripted. (Well thought out and programmed scripts, but scripts all the same.) – Bravax Jan 12 '11 at 17:06
  • @Skorpioh – I think what Bravax is describing involves designing your DB and related code to explicitly support multiple versions, much like an API supports previous functional calling conventions for backwards compatibility. See Dave Amphlett's answer for an outline of this strategy. – Kenny Evitt Sep 06 '11 at 16:03
3

Firstly - do regular, small changes - I've worked as a freelance developer in several major Investment Banks on various 24/7 live trading systems and the best, smoothest deployment model I ever saw was regular (monthly) deployments with a well defined rollback strategy each time.

In this way, all changes are kept to a minimum, bugs get fixed in a timely manner, development doesn't feature creep, and because it's happening so often, EVERYONE is motivated to get the deployment process as automatic and hiccup free as possible.

But inevitably, big schema changes come along that make a rollback very difficult (although it's still important to know - and test - how you'll rollback in case you have to).

For these big schema changes we worked a model of 'bridging the gap'. That is to say that we would implement a database transformation layer which would run in near real-time, updating a live copy of the new style schema data in a second database, based on the live data in the currently deployed system.

We would copy this a couple of times a day to a UAT system and use it as the basis for testing (hence testers always have a realistic dataset to test, and the transformation layer is being tested as part of that).

So the change in database is continuously running live, and the deployment of the new system then is simply a case of:

  1. Freeze everyone out
  2. Switching off the transformation layer
  3. Turning on the new application layer
  4. Switching users over to new application layer
  5. Unfreeze everything

This is where rollback becomes something of an issue though. If the new system has run for an hour, rolling back to the old system is not easy. A reverse transformation layer would be the ideal but I don't think we ever got anyone to buy into the idea of spending the time on it.

In the end we'd deploy during the quietest period possible and get everyone to agree that rollback would take us to the point of switchover and anything missing would have to be manually re-keyed. Mind you - that motivates people to test stuff properly :)

Finally - how to do the transformation layer - in some of the simpler cases we used triggers in the database itself. Only once I think we grafted code into a previous release that did 'double updates', the original update to the current system, and another update to the new style schema. The intention was to release the new system at the next release, but testing revealed the need for tweaks to the database and the 'transformation layer' was in production at that point, so that process got messy.

The model we used most often for the transformation layer was simply another server process running, watching the database and updating the new database based on any changes. This worked well as that code is running outside of production, can be changed at will without affecting the production system (well - if you run on a replication of the production database you can, but otherwise you have to watch out for not tying the production database up with some suicidal queries - just put the best most conscientious guys on this part of the code!)

Anyway - sorry for the long ramble - hope I put the idea over - continuously do your database deployment as a 'live, running' deployment to a second database, then all you've got to do to deploy the new system is deploy the application layer and pipe everything to it.

Dave Amphlett
  • 1,922
  • 1
  • 15
  • 17
  • Based on things I've read before on how Google releases updates to their apps, e.g. Gmail, this kind of explicit deployment-ready design is required to smoothly deploy changes for high-volume apps. – Kenny Evitt Sep 06 '11 at 16:07
1

See my answer here: How to deploy an ASP.NET Application with zero downtime

My approach is to use a combination of polling AppDomains and a named mutex to create an atomic deployment agent.

Community
  • 1
  • 1
Jack
  • 4,684
  • 2
  • 29
  • 22
1

I saw this post a while ago, but have never used it, so can't vouch for ease of use/suitability, but MS have a free web farm deployment framework that may suit you:

http://weblogs.asp.net/scottgu/archive/2010/09/08/introducing-the-microsoft-web-farm-framework.aspx

Paddy
  • 33,309
  • 15
  • 79
  • 114
0

I would reccomend using Analysis Services instead of the database engine for your reporting needs. Then you could process your cubes.. move your database.. change a connection string, reprocess your cubes and thus-- have zero downtime.

Dead serious... There isn't a better product in the world than Analysis Services for this type of thing.

Aaron Kempf
  • 580
  • 2
  • 11
  • Thanks for your response. I've been looking into Analysis Services and I'm pretty sure that it's not suitable to solve our issues. – Skorpioh Jan 17 '11 at 17:49
0

As you say you don't have problem buying new server's, I suggest the best way would be to get a new server deploy you application there first. Follow below steps:
1. Add any certificates if required to this new server and Test your application with new settings.
2. Shutdown your old server and assign it's IP to the new Server, the downtime would be the same as much your server takes to shutdown and you assigning the new IP to the new Server.
3. If you see the new Deploy is not working you can always revert back by following the step 2 again.
Regarding your database backup you would have to set a backup schedule.

Ravi Vanapalli
  • 9,805
  • 3
  • 33
  • 43
  • 1
    I'm sorry, but point nr.2 is exactly 180 degrees in contradiction with the definition of "zero downtime" – Skorpioh Jan 18 '11 at 11:40
0

I just answered a similar question here: Deploy ASP.NET web site and Update MSSQL database with zero downtime

It discusses how to update the database and IIS website during a deployment with zero downtime, mainly by ensuring your database is always backwards compatible (but just to the last application release).

Community
  • 1
  • 1
David Duffett
  • 3,145
  • 2
  • 26
  • 27