5

I have a actionresult that I think is pretty heavy, so I wonder how can I optimize it so it gets better performance. This web application will be used for by +100, 000 users at same time.

Right now my Actionresult does the following things:

  • Retrieve XML file from a internet url
  • Fills the xml data to my DB
  • DB data fills my Viewmodel
  • Returns the model to the view

This 4 functions triggers everytime a user visits the view. This is why I think this Actionresult is very badly made by me.

How can I add this following things to my Actionresults?

add a timer to retrieve XML file and fill xml data to DB, like every 10 minute, so it doesnt trigger everytime a user visits the view. The only function that needs to trigger everytime a user visits the site is the viewmodel binding and returning the model. How can I accomplish this?

Note:

  • the xml file gets updated with new data every 10 min or so.
  • I have around 50 actionresults that does the same get xml data and adds to database but 50 different xml files.
  • If the xml URL is offline it should skip the whole xml retrieve and DB add and just do the modelbinding

This is my actionresult:

public ActionResult Index()
        {
            //Get data from xml url (This is the code that shuld not run everytime a user visits the view)
            var url = "http://www.interneturl.com/file.xml";
            XNamespace dcM = "http://search.yahoo.com/mrss/";
            var xdoc = XDocument.Load(url);
            var items = xdoc.Descendants("item")
            .Select(item => new
            {
                Title = item.Element("title").Value,
                Description = item.Element("description").Value,
                Link = item.Element("link").Value,
                PubDate = item.Element("pubDate").Value, 
                MyImage = (string)item.Elements(dcM + "thumbnail")
               .Where(i => i.Attribute("width").Value == "144" && i.Attribute("height").Value == "81")
               .Select(i => i.Attribute("url").Value)
               .SingleOrDefault()
            })
            .ToList();

            //Fill my db entities with the xml data(This is the code that shuld not run everytime a user visits the view)
            foreach (var item in items)
            {
                var date = DateTime.Parse(item.PubDate);
                if (!item.Title.Contains(":") && !(date <= DateTime.Now.AddDays(-1)))
                    {
                        News NewsItem = new News();
                        Category Category = new Category();
                        var CategoryID = 2;

                        var WorldCategoryID = re.GetByCategoryID(CategoryID);
                        NewsItem.Category = WorldCategoryID;

                        NewsItem.Description = item.Description;
                        NewsItem.Title = item.Title.Replace("'", "");
                        NewsItem.Image = item.MyImage;

                        NewsItem.Link = item.Link;
                        NewsItem.Date = DateTime.Parse(item.PubDate);
                        re.AddNews(NewsItem);
                        re.save();
                    }
                }


            //All code below this commenting needs to run everytime a user visits the view
            var GetAllItems = re.GetAllWorldNewsByID();

            foreach (var newsitemz in GetAllItems)
            {
                if (newsitemz.Date <= DateTime.Now.AddDays(-1))
                {
                    re.DeleteNews(newsitemz);
                    re.save();
                }

            }

            var model = new ItemViewModel()
            {
               NewsList = new List<NewsViewModel>()
            };

            foreach (var NewsItems in GetAllItems)
            {
                FillProductToModel(model, NewsItems);
            }

            return View(model);
        }

Right now everytime a user visits the index view, it will get XML data and add it to the DB, so the bad fix Ive done in my repository is following on addNews:

 public void AddNews(News news)
        {
            var exists = db.News.Any(x => x.Title == news.Title);

             if (exists == false)
            {
                db.News.AddObject(news);
            }
            else
            {
                db.News.DeleteObject(news);
            }
        }

Any kind of solution and info is highly appreciated!

Obsivus
  • 8,231
  • 13
  • 52
  • 97

7 Answers7

3

There are great many things that could be done here: does the file have to be XML (very verbose compared to JSON)? Does it have to be saved to the DB every time?

However, assuming that you have to do every step you have two bottlenecks:

  1. Waiting for the XML file to download/parse
  2. Saving all the XML data to the DB

There are a couple of ways you can speed this up:

Set up a polling interval

If you're happy not seeing updates immediately then you can do something like this:

  • Check the DB for the last update.
  • If (and only if) the last update is more than 10 mins old:
    • Retrieve XML file from a internet url
    • Fills the xml data to my DB
  • DB data fills my Viewmodel
  • Returns the model to the view

This means that your data may be up to 10 mins out of date, but the vast majority of requests will only have to populate the model.

Depending how you're using this you could make this even simpler - just add an OutputCache attribute:

[OutputCache(Duration=600)]
public ActionResult Index() { ...

This will tell the browser to only refresh every 10 mins. You can also set the Location attribute to make that just cached by the browser or on the server for everyone.

Make the XML retrieval async

During the download of the XML file your code is basically just waiting for the URL to be loaded - using the new async keyword in C# you don't need to wait here.

public async Task<ActionResult> Index()
{
    // Get data from xml url
    string url = "http://www.interneturl.com/file.xml";
    XNamespace dcM = "http://search.yahoo.com/mrss/";

    // The await keyword tells the C# code to continue until the slow action completes
    var xdoc = await LoadRemoteXmlAsync(url, dcM);

    // This won't fire until LoadRemoteXmlAsync has finished
    var items = xdoc.Descendants("item")

There's a lot more to using async than I can practically cover here, but if you're on the lastest C# and MVC it could be fairly simple to start using it.

Make only 1 DB call

Your current DB save action is very sub-optimal:

  • Your code suffers from something commonly called the N+1 problem.
  • Each time you add you're first checking the title and deleting the record. This is a very slow way to do the update and will make if very difficult to use any indexes to optimise it.
  • You're looping through all of your news articles every time and deleting all the old ones one by one. That's much slower than an single delete from News where ... query.

Based on this I'd try the following changes (in rough order of how easy they should be):

  1. Change your AddNews method - if the new data is not newer then don't save any changes for that item.

  2. Change your deletion loop to be a single delete from News where Date <= @yesterday

  3. Look at indexes on the news item title and the date, these appear to be the fields that you're querying most.

  4. Look at replacing your AddNews method with something that does an upsert/merge

  5. Does re.GetByCategoryID hit your DB? If so consider splitting that out and either building it into the update query or populating a dictionary to look it up more quickly.

Basically you should have (at most) 1 DB operation per new news article and 1 DB operation to delete the old ones. You currently have 3 per article less than a day old (re.GetByCategoryID + db.News.Any + db.News.Add|DeleteObject) another 1 (re.GetAllWorldNewsByID) and then yet another 1 per article to delete (re.DeleteNews).

Add Profiling

You can add profiling to MVC projects that will tell you exactly how long each step is taking and help find how to optimise them using MiniProfiler. It's used on StackOverflow and I've used it a lot myself - it will tell you which steps are slowing you down and which aren't worth micro-optimising.

If you don't want to use that there are optimisation tools in Visual Studio as well as third party ones like RedGate ANTS.

Community
  • 1
  • 1
Keith
  • 150,284
  • 78
  • 298
  • 434
  • Thanks for a great explanation! How exacly can i Modify all my db calls to a single one? :S – Obsivus Jun 04 '13 at 07:56
  • @Rohander It depends on what technologies you're using to connect to your DB and what DB it is. From the names of the methods that you're calling I'd guess Entity Framework, in which case you can use [`ExecuteStoreCommand`](http://msdn.microsoft.com/en-us/library/system.data.objects.objectcontext.executestorecommand.aspx) to execute your SQL directly. I'd also suggest using something like [MiniProfiler](http://miniprofiler.com/) as it will give you timings for each SQL statement. – Keith Jun 04 '13 at 13:29
0

Move the retrieval of the XML data and population of the database to a back-end process. This way, your action will just retrieve the data from your database and return it.

More specifically,

  1. Create a program that will run in the background (like a Windows service)
  2. In a loop, retrieve your XML data, update your database, then wait for your desired delay period.

For example, to delay, you can use the following to delay for 1 minute (60 seconds):

System.Threading.Thread.Sleep(60*1000);

Sleep may not be the best method of delaying for your needs, but it will work as a start.

Matt Houser
  • 33,983
  • 6
  • 70
  • 88
  • Wont work beacuse the xml file get updated every minute with new xml data – Obsivus May 25 '13 at 14:23
  • Why won't it work? Have your back-end process retrieve it every minute, or every 30 seconds, or every 5 seconds, but take it out of the action. – Matt Houser May 25 '13 at 14:25
  • Oh yeah that will work, How can I accomplish the "retrieve every minute" functionality? – Obsivus May 25 '13 at 14:26
  • I will have around 50 different get xml and add to db. Will it work with this approught? Isent caching a good solution otherwise? – Obsivus May 25 '13 at 14:39
  • You can update as much xml data as you want using this approach. How you store the data is up to you. It can be database, memcache, or another caching mechanism. The point though is to separate it from the action method. – Matt Houser May 25 '13 at 14:42
0

One way to solve this is with ASP.NET output caching.

Output caching allows you to do your action once - the generated page is then cached, so that the action is not performed every time. You can specify the action (or class) to cache, as well as the duration to keep the item cached. When the cached item expires, the action will be run again.

There is a C#/ MVC tutorial at the ASP.NET site: Improving Performance with Output Caching.

I would recommend you read the link, but here are some relevant parts:

Imagine, for example, that your ASP.NET MVC application displays a list of database records in a view named Index. Normally, each and every time that a user invokes the controller action that returns the Index view, the set of database records must be retrieved from the database by executing a database query.

If, on the other hand, you take advantage of the output cache then you can avoid executing a database query every time any user invokes the same controller action. The view can be retrieved from the cache instead of being regenerated from the controller action. Caching enables you to avoid performing redundant work on the server.

You enable output caching by adding an [OutputCache] attribute to either an individual controller action or an entire controller class.


Edit - As an example, this would cache your ActionResult for 1 hour:

[OutputCache(Duration = 3600, VaryByParam = "none")]
public ActionResult Index()
{
...
Community
  • 1
  • 1
chue x
  • 18,573
  • 7
  • 56
  • 70
  • That seems like a good solution could you give me an example on how it could look with my actionresult? – Obsivus May 27 '13 at 19:33
  • I still need to use my entity model with a bunch of entities still – Obsivus May 27 '13 at 19:36
  • Cool so thats all it requires? It will run the actionresult every 1 hour If I add the outputcache? – Obsivus May 27 '13 at 20:30
  • @Obsivus - it won't automatically run. The first request to the action after the cache expiration will run the action. – chue x May 28 '13 at 03:50
  • What do you mean? I tried it out and it seems working, the actionresult seems to trigger once every 1 hour. I cant debug it or anything so it seems working?? – Obsivus May 28 '13 at 06:46
  • @Rohander - Multiple personalities (or names at least)! To answer your question, the cache will only get updated if a user hits the page. In other words, IIS by itself does not update the cache (it only does so if users hit the page). That is all I meant. – chue x May 28 '13 at 13:52
0

100000+ concurrent users and writing to sql database, I think that will be hard to achieve without rethinking whole infrastructure.

I would agree with Matt, that it would be best to separate writes to database (have indexer process) from generating results to user. In such case you would have 1 writer and 100000+ readers and that would be still too much for single server instance and relational database scaling is hard. Looking from this perspective, I would think about non relational persistence solution, especially because your data doesn't look very critical.

Looks like output caching would work for you as information is not user specific, so question is, if simple output caching would be good enough - may be you need distributed cache for this. To know that, you need to count - how often will expire your cache and how much resources you need to generate response for expired cache. Keep in mind, that single server won't be able to serve 100000+ concurrent users, so you will have multiple front end instances, each requiring to generate own result to cache it - that is where distributed cache would kick in - servers could share generated result.

Giedrius
  • 8,430
  • 6
  • 50
  • 91
  • Giedrius the web application only have one database with two tables only, nothing else. It's a very simple web app. I dont see how I can use plenty of databases for such a small database. Do I really need to use indexes if its just two tables? – Obsivus Jun 18 '13 at 18:16
  • @Rohander - if application is really simple, that means it should be really simple to create load test :) My colleagues have done IIS performance test some time ago, having disabled all modules, single server was able to serve ~10000 static files (single pixel image) per second. There's a great free tool for load tests - http://www.loadui.org/, so I encourage you to do simple load tests to see what kind of performance you can get. – Giedrius Jun 19 '13 at 06:16
0

You can implement a class for managing execution of your xml processing and db operations at a chosen time interval. I wouldn't use timers since I don't think it's necessary: if you have 100 000 making requests, you can just check if you need to execute your function every request.

Here is a class you can use:

public static class DelayedAction
{
    private static Dictionary<Action, Tuple<DateTime, TimeSpan>> _actions;

    static DelayedAction()
    {
        _actions = new Dictionary<Action, Tuple<DateTime, TimeSpan>>();
    }

    public static void Add(Action a, TimeSpan executionInterval)
    {
        lock (_actions)
        {
            _actions.Add(a, new Tuple<DateTime, TimeSpan>(DateTime.MinValue, executionInterval));
        }
    }

    public static void ExecuteIfNeeded(Action a)
    {
        lock (_actions)
        {
            Tuple<DateTime, TimeSpan> t = _actions[a];
            if (DateTime.Now - t.Item1 > t.Item2)
            {
                _actions[a] = new Tuple<DateTime, TimeSpan>(DateTime.Now, t.Item2);
                a();
            }
        }
    }
}

It is thread safe, and you can add as much delayed actions as you want.

To use it, just move your xml retrieval and save code to a function, let's call it updateNews:

private void updateNews()
{
       //Get data from xml url (This is the code that shuld not run everytime a user visits the view)
        var url = "http://www.interneturl.com/file.xml";
        XNamespace dcM = "http://search.yahoo.com/mrss/";
        var xdoc = XDocument.Load(url);
        var items = xdoc.Descendants("item")
        .Select(item => new
        {
            Title = item.Element("title").Value,
            Description = item.Element("description").Value,
            Link = item.Element("link").Value,
            PubDate = item.Element("pubDate").Value, 
            MyImage = (string)item.Elements(dcM + "thumbnail")
           .Where(i => i.Attribute("width").Value == "144" && i.Attribute("height").Value == "81")
           .Select(i => i.Attribute("url").Value)
           .SingleOrDefault()
        })
        .ToList();

        //Fill my db entities with the xml data(This is the code that shuld not run everytime a user visits the view)
        foreach (var item in items)
        {
            var date = DateTime.Parse(item.PubDate);
            if (!item.Title.Contains(":") && !(date <= DateTime.Now.AddDays(-1)))
                {
                    News NewsItem = new News();
                    Category Category = new Category();
                    var CategoryID = 2;

                    var WorldCategoryID = re.GetByCategoryID(CategoryID);
                    NewsItem.Category = WorldCategoryID;

                    NewsItem.Description = item.Description;
                    NewsItem.Title = item.Title.Replace("'", "");
                    NewsItem.Image = item.MyImage;

                    NewsItem.Link = item.Link;
                    NewsItem.Date = DateTime.Parse(item.PubDate);
                    re.AddNews(NewsItem);
                    re.save();
                }
            }
}

Then add a static constructor to your controller:

static MyController()
{
    DelayedAction.Add(updateNews, new TimeSpan(0, 10, 0)); // set interval to 10mn
}

and then in your Index method:

public ActionResult Index()
{
    DelayedAction.ExecuteIfNeeded(updateNews);

    //All code below this commenting needs to run everytime a user visits the view
    ....
}

That way, everytime you get a request for this page, you can check if you need to update your data. And you can use that for every other processing that needs to be delayed.

That could be a good addition to caching.

ppetrov
  • 3,077
  • 2
  • 15
  • 27
  • Thank you for a great answer. What is the real different between output cache and this way of doing it? – Obsivus May 31 '13 at 07:25
  • You're welcome:). The difference is that the output cache is a more "global" method. It will cache the entire result of your method. So if you have some dynamic elements other than news, they will also be cached and remain static. With the method I suggested you have a finer control. So depending on the situation you can use one or another (and since you said you had 50 methods like that, there may be situations when caching the whole result won't be enough). – ppetrov May 31 '13 at 13:39
  • also the output cache works at the framework level and it is a cache. The method above isn't a cache, it's a kind of nested scheduler. You can even control it while your site is running if you make some adjustments, you can force an update, change the execution interval... It's all about how much control you want to have over this. Both output cache and this method can solve the problem, it's all about choice ;) – ppetrov May 31 '13 at 13:44
0

First of all you should't deleting of news in runtime. You can do that manually or by shedule and instead of

var GetAllItems = re.GetAllWorldNewsByID();
foreach (var newsitemz in GetAllItems)
{
    if (newsitemz.Date <= DateTime.Now.AddDays(-1))
    {
        re.DeleteNews(newsitemz);
        re.save();
    }
}

use code:

var GetAllItems = re.GetAllWorldNewsByID().Where(x=>x.Date > DateTime.Now.AddDays(-1)).ToList();

GetAllWorldNewsByID() must return IQuaryable, because if you return List, you lose benefits of deffered execution (What are the benefits of a Deferred Execution in LINQ?). When you don't deleting at runtime, you havn't big delay for service operation (because that operation not for users, but for cleaning your database)

At the second, you can use caching

//Get data from xml url (This is the code that shuld not run everytime a user visits the view)
var url = "http://www.interneturl.com/file.xml";
// Get data from cache (if available)
List<TypeOfItems> GetAllItems = (List<TypeOfItems>)HttpContext.Current.Cache.Get(/*unique identity of you xml, such as url*/ url);
if (GetAllItems == null)
{
    var xdoc = XDocument.Load(url);
    items = xdoc.Descendants("item").Select(item => new
        {
            Title = item.Element("title").Value,
            Description = item.Element("description").Value,
            Link = item.Element("link").Value,
            PubDate = item.Element("pubDate").Value, 
            MyImage = (string)item.Elements(dcM + "thumbnail")
           .Where(i => i.Attribute("width").Value == "144" && i.Attribute("height").Value == "81")
           .Select(i => i.Attribute("url").Value)
           .SingleOrDefault()
        })
        .ToList();

    // Fill db

    GetAllItems = re.GetAllWorldNewsByID().Where(x=>x.Date > DateTime.Now.AddDays(-1)).ToList()

    // put data into the cache
    HttpContext.Current.Cache.Add(/*unique identity of you xml, such as url*/url, /*data*/ GetAllItems, null,
        DateTime.Now.AddMinutes(1) /*time of cache actual*/,
        System.Web.Caching.Cache.NoSlidingExpiration,
        System.Web.Caching.CacheItemPriority.Default, null);
}
Community
  • 1
  • 1
Xordal
  • 1,369
  • 13
  • 29
0

Right now my Actionresult does the following things:

  • Retrieve XML file from a internet url
  • Fills the xml data to my DB
  • DB data fills my Viewmodel
  • Returns the model to the view

In all that steps the XML file is one of the "weak/slow link".

In my applications I have change the serialization of XML with the Protobuf (for net) and the speed was change dramatically !

The read,write,transfer of the XML file make this steps slow, especially if you make it on each call.

So this is the first point that I will change, the XML serialization with a faster one.

Aristos
  • 66,005
  • 16
  • 114
  • 150