24

I have a very weird issue with an Azure Web App, and I'm getting quite frustrated with it.

We experience our app is very fast and responsive when using it, however, if we don't use it for roughly ten minutes, it has a very cold start (~10-20 seconds). This cold start only happens when it involves the database. When it's a bit like when we release the web app.

Our attempts

Using Application Insights inside Azure, we have setup this ping every 5 minutes:

enter image description here

The outliers are always caused by my deploys (not using deployment slots right now). However, this login page does not call our database so we don't see the "cold" start in these data.

The application setup should be solid. Our web app is hosted in North Europe with Always on:

enter image description here

We just moved the whole setup to a new resource group / app service plan, to make sure our problem was tangled with our other apps. The new app service plan is a Standard 1 small, which should not be a problem. Looking at our consumption I am not worried, and could probably even try a smaller service which I will do after solving our problem:

enter image description here

Our SQL database is also hosted in North Europe (checked locations a billion times because I've made that mistake before).

Just like with the app service, we've picked a "too big" hardware to make sure that is not causing the problem (the Standard S0: 10 DTUs). The usage is ridiculously low:

enter image description here

We do use continuous deployment (Deployment options inside the Azure menu), but looking at the deployments, it should not constantly deploy something:

enter image description here

The frustration comes in the app is super responsive when it works. When it's "warm" every page loads in seconds, just like my average response time shows on our web app:

enter image description here

But these numbers are just plain wrong when we (or our users!) use our app. Here we experience it's very often a +10-20 second load the first time.

Does anyone have ANY idea? Any hints? You've no idea how grateful I will be.

EDIT & UPDATE 1:

I have decided to setup some more tests. I've now managed to get the real data showing our problem by calling another page. Ironically this page do NOT call the database, so while I thought this was a database problem, it does not seem like this. See the challenge here (trend continues +24 hours).

It's weird how stable it is being exactly ~10 seconds. And the trend does not seem to be every 10-20 minute, but closer to every 5 minute - with exactly the same interval between them:

enter image description here

EDIT & UPDATE 2:

I've been digging some more. Turns out there are a couple of very interesting insights: The "slow" 11 seconds calls from edit 1, is only from East US and from one endpoint (http://prntscr.com/jcv69w), and

The most interesting thing I found is the following:

The application itself does NOT have any caching. I use Entity Framework which I assume use some caching, but that's all.

I was logged into our app, and clicked around in Chrome. I found out, that the pages I had already visited was showing instant (with data from DB), but if I opened a new page, it would load slow. It seemed like some entities is being cached the first time I open a page.

I then tried to open the app in a new browser. If I opened a page I had prior opened in Chrome, it would open instant. If I opened a new page I didn't click before, it would have the ~10 second load.

My best guess right now is that the Entity Framework I use is giving problems for some reason.

EDIT 3:

Just added a bounty, and is setting up a lot of logging. I have added the MiniProfiler, but have trouble getting it to work in production (is only shown on local requests).

I have also added logging in global.asax for Application_Start and Application_BeginRequest and Application_EndRequest to see some and status there. Will update with findings soon.

EDIT 4:

So now I've the first interesting numbers in. The app is not being recycled. Application_Start is only called once.

I can see the time difference by logging on EndRequest and BeginRequest. I can see there are multiple calls which takes more than +15 seconds between these two... But when site is warm, it takes ~0.5-2 seconds depending on page. So something very weird is happening between the beginning and end of the request. Debugging further!

EDIT 5:

Got MiniProfiler to work. Here is an example of the slow load (~15 seconds):

enter image description here

My next step is adding Entity Framework tracking and even some more line for line calls. I'm getting my money on the database!

EDIT 6:

Okidoki, I was wrong. it's the render method that's slow - not the database! I've NO idea how to debug this... To the google!

enter image description here

EDIT 7:

Time for another update. Status is: nothing has been solved.

So I have tried a lot of things:

1) I tried to disable all types of caching (Prevent Caching in ASP.NET MVC for specific actions using an attribute) and I have same behavior. First load? Slow. Next load? Fast. Wait 5-10 minute, same behavior so not solved.

2) I had some custom things in my startup.auth file with a 5 minute delay. Removed. Not the problem.

3) I used a custom attribute for authorization. I removed that.

4) I updated my Entity Framework implementation to make it work in per request

I'm getting really frustrated. My next step is:

A) Try to make 5-10 versions of same page (without _layout, with layout, with database, without database, with dependency injection, without... all these things), so see if I can find a pattern.

B) Try moving the hosting to a virtual machine to see if it solves the problem

EDIT 8 - NEW RELIC ADDED:

I have now added New Relic. Two very scary things are the following (I found and reproduced the error!):

enter image description here

And frontend wise (Browser part of New Relic), there is a ~15s lack between two starts:

http://prntscr.com/jevgeg vs http://prntscr.com/jevgix with nothing inbetween.

Lars Holdgaard
  • 9,496
  • 26
  • 102
  • 182
  • https://github.com/projectkudu/kudu/issues/2583 This is an issue we had with deployment slots regarding cold starts. TL:DR try making the app HTTPS only following the instructions from https://blogs.msdn.microsoft.com/benjaminperkins/2017/11/30/how-to-make-an-azure-app-service-https-only/ – GuruCharan94 May 01 '18 at 12:24
  • Before that can you please check if the app insights ping URL and the URL your users are using are both http or both https. I'm guessing one is http and other is https. Wild guess – GuruCharan94 May 01 '18 at 12:30
  • @GuruCharan94 The app is already HTTPS only and the application insight calls the http version. Unfortunately :) – Lars Holdgaard May 01 '18 at 12:32
  • 1
    If HTTPS only setting is ON, then the appinsights ping should auto redirect to the HTTPS link. But this "Always On" feature should take care of cold starts irrespective of app insights. Can't think of any other solution at the moment... – GuruCharan94 May 01 '18 at 13:05
  • Can you show us the model and view code? – Ionut Ungureanu May 03 '18 at 15:35
  • If the render is slow, use the F12 console in your browser to break it down further and work out why it's taking so long. Maybe there's a CDN that always takes this long or a bundle or something. – Nick.Mc May 04 '18 at 03:31
  • Is it only when you deploy your solution over to Azure or when you run it in your development environment as well? – NitinSingh May 04 '18 at 06:03
  • EF doesn't really cache data by itself. (there are some internal optimizations that make a difference between cold/warm lookups though) – Jim Wolff May 04 '18 at 06:39
  • Looks like you're going to need more detail than MiniProfiler provides. Try swapping in [Glimpse](https://getglimpse.com/) for more detail about server side speed.. You could also try introducing your own tracing using System.Trace to track down the actual steps where things grind to a halt – Josh May 05 '18 at 22:46
  • Can you try disabled the caching of pages from MVC, just disable page cachcing and see what happens, I think all your loads will take long time. This will confirm that problem is on application side rather than on hosting side. – harishr May 07 '18 at 03:51
  • Did you try a tool to ping it periodically? Like UptimeRobot? I've got the same problem. After 10 minutes or so, even with "always on" the app gets cold. The same page, with no changes, can take 10 seconds when normally it takes .5 second. – George Beier Feb 21 '19 at 02:27
  • @LarsHoldgaard do you recognize an increase of the cold start up time since April too? – Falco Alexander Apr 08 '19 at 08:22

6 Answers6

4

I have a couple of possible answers.

Entity framework code-first/database initialization: If you are using code first setup with migrations and possibly seed data, each of these things can cause some "warmup" issues.

Specially if you are not initializing the database on app startup, that would mean that the first time you hit your database is when it gets initialized.

Entity framwork version: Entity framework itself has also had a lot of performance improvements in 5 and 6.x some of these also have to do with both cold and warm startup speeds.

Views aren't precompiled: If you are getting slow loads on pages (like after deployment) every first hit of a new page (view), and then subsequent loads are fine. This can be because pages aren't compiled, i can elaborate on that if thats the case.

Recycle It sounds like you are experiencing these issues when the application recycles, and it isn't auto initializing (which is why you get that cold hit) the worst performance issues i have seen with these things are usually entity framework and precompile related. But both can be easily fixed. But ensuring the app is "always running" and self initializes after recycle also ensures that no users get this cold hit.

UPDATE: Since it was view related i can offer a solution i have found very useful. Installing RazorGenerator.Mvc Nuget package. And adding this Engine as the first engine will ensure you use compiled views.

In App_Start you could create a file called RazorGeneratorMvcStart.cs with content like this:

using RazorGenerator.Mvc;

[assembly: WebActivatorEx.PostApplicationStartMethod(typeof(MyNamespace.RazorGeneratorMvcStart), "Start")]

namespace MyNamespace {
    public static class RazorGeneratorMvcStart {
        public static void Start() {
            ViewEngines.Engines.Insert(0, new PrecompiledMvcEngine(typeof(RazorGeneratorMvcStart).Assembly));

            VirtualPathFactoryManager.RegisterVirtualPathFactory(engine);
        }
    }
}

The razor engine can even take a parameter for UsePhysicalViewsIfNewer for those that like replacing a view live. In that case it uses the precompiled version, unless a view with a newer date than the compiled .dll has been placed in the folder.

This approach should solve performance issues with views.

Jim Wolff
  • 5,052
  • 5
  • 34
  • 44
  • I can elaborate on any of those things but probably need a bit more information first. – Jim Wolff May 04 '18 at 06:40
  • Thanks for the reply! I will get to test this after some morning meetings! – Lars Holdgaard May 04 '18 at 07:26
  • 1
    @LarsHoldgaard this update might solve your headaches more permanently, it has solved that exact pain for me before. You can look into [RazorGenerator](https://github.com/RazorGenerator/RazorGenerator) to find out more – Jim Wolff May 14 '18 at 10:46
2

A few ideas:

  1. On your Web App blade, go to the Diagnose and solve problems menu. Then click on performance counters. I'd honestly fish through every one of the available perf counters, paying attention to the timeline vs your degraded performance. I once found out that SignalR was choking out my server due to runaway connections by looking at Thread Count.

  2. Is the server errors log in Application Insights clean?

  3. Under the Diagnose and Solve Problems screen, do you see anything suspicious in the Failed Request Tracing Logs?

Rob Reagan
  • 7,313
  • 3
  • 20
  • 49
  • Hi Rob! Thanks a lot for taking your time to come with some ideas! 1) Everything seems shockingly clean. If you see my updated answer you can see I found a "proof" of the slow requests. In diagnostics I cannot see the challenges: http://prntscr.com/jcs28v . & http://prntscr.com/jcs2hf & http://prntscr.com/jcs2k7 . I am waiting for collecting the rest 2) Yea, it's clean: http://prntscr.com/jcs1zw 3) None, unfortunately. Seems empty - http://prntscr.com/jcs4vs – Lars Holdgaard May 02 '18 at 14:05
  • Are we talking about a solution that only consists of a Web App and a SQL DB, no other dependencies? – Rob Reagan May 02 '18 at 14:06
  • Yes. There are other dependencies (a few microservices and an Azure Storage account), but not on the pages that show a slow load, so I am a bit unsure how they could influence this – Lars Holdgaard May 02 '18 at 14:10
2
  1. Separate out anything from deployment vs local. If the application is running perfect in local environment, then something different is happening when it goes to Azure. Resolving something is taking lot of time.
  2. Any static resource (script, styles etc) do get auto cached by browser on 1st request, so on subsequent request, the issue should not come.
  3. Since you know that "Render" method is giving the problem, looks like a complex nesting computation and browser DOM manipulation is happening. However if this is not a challenge when on local, then check if rendering calls for outside resources and they getting blocking calls (may need heavy use of ajax or server side async calls, although I assume you would already have that in place.)
  4. Possible refactoring may be required (to have smaller stacks and possible use of threading/non-blocking IO), however require to see your code to suggest on that.

Share your page load event and any subsequent calls made by it. Also how's the view is rendered along with any DOM handling.

NitinSingh
  • 2,029
  • 1
  • 15
  • 33
2

There are two more options to find and fix.

1. User New Relic to trace.

Check this post from hanselman on using it in Azure.

2. Follow Asp.Net MVC Best practices

Check this post for Best practices.

Also for data reads you can Use ADO.NET with stored procedures instead of Entity Framework, because as of now EF may have some performance issues.

Shaiju T
  • 6,201
  • 20
  • 104
  • 196
  • Entity framework doesn't have performance _issues_ per se. There are a lot of misleading posts and bad practices out there. As long you you are aware of things like using `.AsNoTracking()` you're good. Faster alternatives exist, but not as feature rich. EF can also execute stored procedures or raw SQL. – Jim Wolff May 14 '18 at 10:28
2

I am posting an answer even though it's not solved, but I'm 99% sure I've found the underlying problem.

What happens is that when I release, every view needs to be built. It takes ~15 seconds to build a view, which is what New Relic also shows in my latest update.

This brings two temporary solutions and one bigger question: why is the building of the view so slow?

The temporary solutions are simple. Either compile the views when releasing or visit the most important pages just after a release. This is obviously annoying because I release multiple times per day.

The reason it's so slow, I assume, is because I am using a very big Bootstrap theme. The way I am handling the bundles is not very efficient, which could provide a problem.

The reason I thought the pain was the site was slow after ~10 minutes, was simply because I released new code quite often, and not visiting a large portion of our pages. After doing this, it's quick.

Thanks a lot for your help and support - at least now I can deal with it.

Lars Holdgaard
  • 9,496
  • 26
  • 102
  • 182
  • Lars, have you considered specifying pages to warm up in your web.config, then coupling your deployment with a slot swap in your CI/CD setup? That could solve your problems I think. – Rob Reagan May 10 '18 at 13:54
1

Are you compiling Less or Sass in your bundling setup?
If yes, what JavaScriptEngineSwitcher are you using?
I remember having a similar problem. The bundles are compiled on first access and it was taking a huge amount of time.
The solution was to switch to the V8 engine.

Attila Szasz
  • 3,033
  • 3
  • 25
  • 39
  • Not sure. The Gruntfile compiles the CSS based on my SASS files, and the bundles only include the CSS results. I doubt that it is, but of course.. My grunt file could in theory do some stuff I didn't think about: http://prntscr.com/jf6d5c – Lars Holdgaard May 08 '18 at 08:43
  • 1
    For me, the less files where included in the bundle and compiled on first access. It doesn't look like the case for you. – Attila Szasz May 08 '18 at 08:46