134

To deploy a new version of our website we do the following:

  1. Zip up the new code, and upload it to the server.
  2. On the live server, delete all the live code from the IIS website directory.
  3. Extract the new code zipfile into the now empty IIS directory

This process is all scripted, and happens quite quickly, but there can still be a 10-20 second downtime when the old files are being deleted, and the new files being deployed.

Any suggestions on a 0 second downtime method?

Sklivvz
  • 30,601
  • 24
  • 116
  • 172
Karl Glennon
  • 3,149
  • 4
  • 29
  • 31

13 Answers13

86

You need 2 servers and a load balancer. Here's in steps:

  1. Turn all traffic on Server 2
  2. Deploy on Server 1
  3. Test Server 1
  4. Turn all traffic on Server 1
  5. Deploy on Server 2
  6. Test Server 2
  7. Turn traffic on both servers

Thing is, even in this case you will still have application restarts and loss of sessions if you are using "sticky sessions". If you have database sessions or a state server, then everything should be fine.

Sklivvz
  • 30,601
  • 24
  • 116
  • 172
  • 4
    You can also configure the load balancer so that it services existing sessions for a given server, but doesn't accept new ones. That allows you to avoid dropping sessions. This technique however requires waiting for the sessions to end, and in general you'll want to script this. –  Oct 07 '09 at 04:13
  • 44
    This method tends to fall down when the code roll has structural changes to the database. Once you upgrade the DB for Server 1, server 2 will explode. Now you can backup/restore the database for testing on server 1, but then you have the issue of sorting out the data that changed in the live DB while while the parallel copy was running. – EBarr Aug 26 '10 at 22:03
  • @EBarr : Have version 2 point to a different DB altogether. This way you can keep server 2 on version 1 while upgrading server 1. – Andrei Rînea Jan 04 '11 at 20:09
  • 1
    @AndreiRinea -- how do you suppose this would work in an high volume OLTP system? Either the system goes out of sync and you loose data when you cut over, or you need to pause data entry and write a script to identify & migrate the transitory data to the new DB structure. – EBarr Jan 05 '11 at 21:45
  • @EBarr : In cases of a high-volume OLTP system you will need to take down for maintenance the whole system, after you warn well in advance users and prohibit logins after a certain moment. – Andrei Rînea Jan 13 '11 at 17:25
  • 1
    @Andrei Rinea --- exactly my point! Look at my first comment ....the method above fails. Which begs the question posed by @Recursieve --"how to deploy an asp.net application with zero down time?" – EBarr Jan 20 '11 at 13:31
  • 1
    @EBarr: Unless you plan your db changes for zero downtime, forget about it. If you use stored procedures this is always possible (maybe involving the use of mirroring with a witness). – Sklivvz Jan 20 '11 at 15:47
  • 13
    @EBarr: and in any case *technically* you still have zero downtime on the ASP.NET app -- the question isn't "how to deploy to a sql server db with zero downtime". – Sklivvz Jan 20 '11 at 15:48
  • 1
    This can work if your load balancer (or webserver) has the ability to temporarily "hold" http requests ... but I don't know of any – Jack Sep 17 '12 at 12:07
  • @sklivvz, any full source code script sample about it? is it possible, Powershell maybe – Kiquenet Dec 12 '12 at 07:38
  • 10
    They key is to develop in a way that your sql changes aren't destructive. You often have to do any destructive sql changes in the following release once it's no longer used. It's not hard to do with practise. – Bealer Dec 19 '13 at 17:38
  • 1
    @Bealer that issue is relevant in any configuration: if you depend on more than one component each change should be backwards compatible to interop with the rest. – Sklivvz Dec 19 '13 at 17:40
  • 1
    @AminM buy or rent another? :-) Apart from jokes, you can have two version running on the same server, but you'd still need a load balancer to direct the traffic, acting as a reverse proxy as well in this case. – Sklivvz Jan 22 '16 at 12:54
60

The Microsoft Web Deployment Tool supports this to some degree:

Enables Windows Transactional File System (TxF) support. When TxF support is enabled, file operations are atomic; that is, they either succeed or fail completely. This ensures data integrity and prevents data or files from existing in a "half-way" or corrupted state. In MS Deploy, TxF is disabled by default.

It seems the transaction is for the entire sync. Also, TxF is a feature of Windows Server 2008, so this transaction feature will not work with earlier versions.

I believe it's possible to modify your script for 0-downtime using folders as versions and the IIS metabase:

  • for an existing path/url:
  • Copy new (or modified) website to server under
    • \web\app\v2.1\
  • Modify IIS metabase to change the website path
    • from \web\app\2.0\
    • to \web\app\v2.1\

This method offers the following benefits:

  • In the event new version has a problem, you can easily rollback to v2.0
  • To deploy to multiple physical or virtual servers, you could use your script for file deployment. Once all servers have the new version, you can simultaneously change all servers' metabases using the Microsoft Web Deployment Tool.
George Tsiokos
  • 1,890
  • 21
  • 31
  • 5
    I've implemented this approach by adapting our powershell deployment scripts. You can see the part of the script which changes the IIS site folder here: http://stackoverflow.com/questions/330608/changing-iis6-site-home-directory-with-powershell-answered Thanks for the pointer. – Karl Glennon Dec 01 '08 at 16:14
  • 17
    Unfortunately, this method doesn't account for structural changes to the DB. Once you upgrade the DB for v2.1 then v.2.0 explodes. – EBarr Aug 26 '10 at 22:07
  • 8
    Using TxF is overkill here, IMO. It doesn't hurt anything to have both v2.0 and v2.1 in the filesystem at the same time. The big change happens when v2.1 goes online, and by that time, the TxF transaction has been committed. The zero downtime really happens because of the way IIS moves from an old AppPool to a new one, not because of TxF. – RickNZ Aug 29 '10 at 01:52
  • 5
    Another problem with this is if a large amount of user data is stored in subfolders of the app folders. – Kenny Evitt Sep 06 '11 at 18:44
  • won't work when there are dependencies outside of the web app (like a db) because IIS processes old and new requests IN PARALLEL – Jack Sep 17 '12 at 12:05
  • any full source code script sample about it? Powershell maybe – Kiquenet Dec 12 '12 at 07:36
  • @Kenny Evitt you can avoid putting user data inside app folders and subfolders and then put outside the app folder. You can use a Web.config app key to manage between different environments =) – Akira Yamamoto Jan 18 '13 at 18:38
  • 1
    Powershell script for zero downtime deployment using ARR in a single machine: https://github.com/yosoyadri/IIS-ARR-Zero-Downtime/blob/master/DeployLocalFarm.ps1 – Yosoyadri Mar 04 '13 at 10:57
  • When this causes an app-restart, will it forcefully stop any existing requests, or will it gracefully do so? – Sam Jun 07 '13 at 05:25
  • @Quandary session should always be out of process - never use – George Tsiokos Mar 17 '14 at 18:19
  • Can anyone comment on TxF support in MsDeploy and how to enable it? I cannot seem to find any information on this. – arni Sep 21 '14 at 16:50
  • 5
    This is not 0 second deployment because the new app needs to start up. – usr Jan 07 '15 at 14:44
20

You can achieve zero downtime deployment on a single server by utilizing Application Request Routing in IIS as a software load balancer between two local IIS sites on different ports. This is known as a blue green deployment strategy where only one of the two sites is available in the load balancer at any given time. Deploy to the site that is "down", warm it up, and bring it into the load balancer (usually by passing a Application Request Routing health check), then take the original site that was up, out of the "pool" (again by making its health check fail).

A full tutorial can be found here.

kavun
  • 3,358
  • 3
  • 25
  • 45
9

I went through this recently and the solution I came up with was to have two sites set up in IIS and to switch between them.

For my configuration, I had a web directory for each A and B site like this: c:\Intranet\Live A\Interface c:\Intranet\Live B\Interface

In IIS, I have two identical sites (same ports, authentication etc) each with their own application pool. One of the sites is running (A) and the other is stopped (B). the live one also has the live host header.

When it comes to deploy to live, I simply publish to the STOPPED site's location. Because I can access the B site using its port, I can pre-warm the site so the first user doesn't cause an application start. Then using a batch file I copy the live host header to B, stop A and start B.

Rob King
  • 91
  • 1
  • 1
  • 1
    This helps with downtime due to file copy, but has the same issue as @Sklivvz -- as soon as the code roll has structural changes to the database the site goes boom. – EBarr Aug 26 '10 at 22:05
  • This seemed like the intuitive way to me as well, but why isn't there an easy, built-in way to do this? – Petrus Theron Mar 09 '12 at 12:23
  • 3
    @Ebarr then don't roll out destructive sql changes. For example, if you need to remove a column, do so in the next release when it's no longer used by A or B. – Bealer Dec 19 '13 at 17:40
  • @Bealer -- agreed (with caveat).There is a whole series of these questions on "downtime during code roles". I have yet to find one that really discusses the realities of evolving a DB schema. Caveat - there are a variety of complications that come along with two-phase changes to schema. One example -- many of the ORMs barf if the table definition differs from the definition as it understand it (new or missing columns). – EBarr Dec 19 '13 at 18:43
  • 2
    @Rob how can you "pre-warm" the site if it is stopped? – Andrew Gee Jan 29 '14 at 15:37
  • I like this approach for a single-server setup. It may not be up to you what the infrastructure looks like. – testpattern Jan 07 '16 at 10:34
  • 2
    @Rob "copy the live host header to B" Can you explain this? – Justin J Stark Mar 15 '18 at 19:56
8

OK so since everyone is downvoting the answer I wrote way back in 2008*...

I will tell you how we do it now in 2014. We no longer use Web Sites because we are using ASP.NET MVC now.

We certainly do not need a load balancer and two servers to do it, that's fine if you have 3 servers for every website you maintain but it's total overkill for most websites.

Also, we don't rely on the latest wizard from Microsoft - too slow, and too much hidden magic, and too prone to changing its name.

Here's how we do it:

  1. We have a post build step that copies generated DLLs into a 'bin-pub' folder.

  2. We use Beyond Compare (which is excellent**) to verify and sync changed files (over FTP because that is widely supported) up to the production server

  3. We have a secure URL on the website containing a button which copies everything in 'bin-pub' to 'bin' (taking a backup first to enable quick rollback). At this point the app restarts itself. Then our ORM checks if there are any tables or columns that need to be added and creates them.

That is only milliseconds downtime. The app restart can take a second or two but during the restart requests are buffered so there is effectively zero downtime.

The whole deployment process takes anywhere from 5 seconds to 30 minutes, depending how many files are changed and how many changes to review.

This way you do not have to copy an entire website to a different directory but just the bin folder. You also have complete control over the process and know exactly what is changing.

**We always do a quick eyeball of the changes we are deploying - as a last minute double check, so we know what to test and if anything breaks we ready. We use Beyond Compare because it lets you easily diff files over FTP. I would never do this without BC, you have no idea what you are overwriting.

*Scroll to the bottom to see it :( BTW I would no longer recommend Web Sites because they are slower to build and can crash badly with half compiled temp files. We used them in the past because they allowed more agile file-by-file deployment. Very quick to fix a minor issue and you can see exactly what you are deploying (if using Beyond Compare of course - otherwise forget it).

mike nelson
  • 21,218
  • 14
  • 66
  • 75
7

Using Microsoft.Web.Administration's ServerManager class you can develop your own deployment agent.

The trick is to change the PhysicalPath of the VirtualDirectory, which results in an online atomic switch between old and new web apps.

Be aware that this can result in old and new AppDomains executing in parallel!

The problem is how to synchronize changes to databases etc.

By polling for the existence of AppDomains with old or new PhysicalPaths it is possible to detect when the old AppDomain(s) have terminated, and if the new AppDomain(s) have started up.

To force an AppDomain to start you must make an HTTP request (IIS 7.5 supports Autostart feature)

Now you need a way to block requests for the new AppDomain. I use a named mutex - which is created and owned by the deployment agent, waited on by the Application_Start of the new web app, and then released by the deployment agent once the database updates have been made.

(I use a marker file in the web app to enable the mutex wait behaviour) Once the new web app is running I delete the marker file.

Jack
  • 4,684
  • 2
  • 29
  • 22
5

The only zero downtime methods I can think of involve hosting on at least 2 servers.

Sam Meldrum
  • 13,835
  • 6
  • 33
  • 40
1

I would refine George's answer a bit, as follows, for a single server:

  1. Use a Web Deployment Project to pre-compile the site into a single DLL
  2. Zip up the new site, and upload it to the server
  3. Unzip it to a new folder located in a folder with the right permissions for the site, so the unzipped files inherit the permissions correctly (perhaps e:\web, with subfolders v20090901, v20090916, etc)
  4. Use IIS Manager to change the name of folder containing the site
  5. Keep the old folder around for a while, so you can fallback to it in the event of problems

Step 4 will cause the IIS worker process to recycle.

This is only zero downtime if you're not using InProc sessions; use SQL mode instead if you can (even better, avoid session state entirely).

Of course, it's a little more involved when there are multiple servers and/or database changes....

RickNZ
  • 18,448
  • 3
  • 51
  • 66
  • 1
    Same issue as @Sklivvz -- This method falls down as soon as the code roll has structural changes to the database. – EBarr Aug 26 '10 at 22:04
  • 4
    That's why I said it was more involved when there are DB changes... Rolling out code with structural changes to the DB is not just a deployment issue; there also has to be support in the code, and probably in the DB too. – RickNZ Aug 29 '10 at 01:48
1

To expand on sklivvz's answer, which relied on having some kind of load balancer (or just a standby copy on the same server)

  1. Direct all traffic to Site/Server 2
  2. Optionally wait a bit, to ensure that as few users as possible have pending workflows on the deployed version
  3. Deploy to Site/Server 1 and warm it up as much as possible
  4. Execute database migrations transactionally (strive to make this possible)
  5. Immediately direct all traffic to Site/Server 1
  6. Deploy to Site/Server 2
  7. Direct traffic to both sites/servers

It is possible to introduce a bit of smoke testing, by creating a database snapshot/copy, but that's not always feasible.

If possible and needed use "routing differences", such as different tenant URL:s (customerX.myapp.net) or different users, to deploy to an unknowing group of guinea pigs first. If nothing fails, release to everyone.

Since database migrations are involved, rolling back to a previous version is often impossible.

There are ways to make applications play nicer in these scenarios, such as using event queues and playback mechanisms, but since we're talking about deploying changes to something that is in use, there's really no fool proof way.

gliljas
  • 711
  • 8
  • 10
1

This is how I do it:

Absolute minimum system requirements:
1 server with

  • 1 load balancer/reverse proxy (e.g. nginx) running on port 80
  • 2 ASP.NET-Core/mono reverse-proxy/fastcgi chroot-jails or docker-containers listening on 2 different TCP ports
    (or even just two reverse-proxy applications on 2 different TCP ports without any sandbox)

Workflow:

start transaction myupdate

try
    Web-Service: Tell all applications on all web-servers to go into primary read-only mode 
    Application switch to primary read-only mode, and responds 
    Web sockets begin notifying all clients 
    Wait for all applications to respond

    wait (custom short interval)

    Web-Service: Tell all applications on all web-servers to go into secondary read-only mode 
    Application switch to secondary read-only mode (data-entry fuse)
    Updatedb - secondary read-only mode (switches database to read-only)

    Web-Service: Create backup of database 
    Web-Service: Restore backup to new database
    Web-Service: Update new database with new schema 

    Deploy new application to apt-repository 
    (for windows, you will have to write your own custom deployment web-service)
    ssh into every machine in array_of_new_webapps
    run apt-get update
    then either 
    apt-get dist-upgrade
    OR
    apt-get install <packagename>
    OR 
    apt-get install --only-upgrade <packagename>
    depending on what you need
    -- This deploys the new application to all new chroots (or servers/VMs)

    Test: Test new application under test.domain.xxx
    -- everything that fails should throw an exception here
    commit myupdate;

    Web-Service: Tell all applications to send web-socket request to reload the pages to all clients at time x (+/- random number)
    @client: notify of reload and that this causes loss of unsafed data, with option to abort 

    @ time x:  Switch load balancer from array_of_old_webapps to array_of_new_webapps 
    Decomission/Recycle array_of_old_webapps, etc.

catch
        rollback myupdate 
        switch to read-write mode
        Web-Service: Tell all applications to send web-socket request to unblock read-only mode
end try 
Stefan Steiger
  • 78,642
  • 66
  • 377
  • 442
0

A workaround with no down time and I am regularly using is:

  1. Rename running .NET core application dll to filename.dll.backup

  2. Upload the new .dll (web application is available and serving the requests while file is being uploaded)

  3. Once upload is complete recycle the Application Pool. Either Requires RDP Access to server or function to recycle application pool in your hosting control panel.

IIS overlaps the app pool when recycling so there usually isn’t any downtime during a recycle. So requests still come in without every knowing the app pool has been recycled and the requests are served seamlessly with no downtime.

I am still searching for more better method than this..!! :)

Bharat Vasant
  • 850
  • 3
  • 12
  • 46
0

IIS/Windows

After trying every possible solution we use this very simple technique:

  1. IIS application points to a folder /app that is a symlink (!) to /app_green
  2. We deploy the app to /app_blue
  3. We change the symlink to point to /app_blue (the app keeps working)
  4. We recycle the application pool

Zero downtime, but the app does choke for 3-5 seconds (JIT compilation and other initialization tasks)

Someone called it a "poor man's blue-green deployment" without a load balancer.

Nginx/linux

On nginx/linux we use "proper" blue-green deployment:

  1. nginx reverse proxy points to localhost:3000
  2. we deploy to localhost:3001
  3. warmup the localhost:3001
  4. switch the reverse proxy
  5. shot down localhost:3000

(or use docker)

Both windows and linux solutions can be easily automated with powershell/bash scripts and invoked via Github Actions or a similar CD/CI engine.

Alex from Jitbit
  • 53,710
  • 19
  • 160
  • 149
-8

I would suggest keeping the old files there and simply overwriting them. That way the downtime is limited to single-file overwrite times and there is only ever one file missing at a time.

Not sure this helps in a "web application" though (i think you are saying that's what you're using), which is why we always use "web sites". Also with "web sites" deploying doesn't restart your site and drop all the user sessions.

mike nelson
  • 21,218
  • 14
  • 66
  • 75