0

We're having to call data from a hideously limited Web API provided to us by means beyond our control. The logic flow is as follows:

For each product represented by a list of ID values
  Get each batch of sub-categories of type FOO (100 records max. per call)
    Keep calling the above until no records remain
  Get each batch of sub-categories of type BAR (100 records max. per call)
    Keep calling the above until no records remain

This at present is generating nearly 100 web API calls (we've asked the provider, and there's no improving the web API to mitigate this).

I'm worried that performance will suffer drastically because of this, so am trying to get my head around the asynchronous alternative, in the hope that this will help. One major issue is that the data can only be called once. After that, it becomes locked and isn't resent, which is a major limitation to our testing.

I've read here, here and here, but am struggling to adapt to my code as I think I need two await calls rather than one, and am worried things will get messed up.

Can anyone please apply the await and async logic into this pseudo code so that I can then read up and try follow the flow of what's happening?

public class DefaultController : Controller
{
    public ActionResult Index()
    {
        var idlist = new List<String>() {"123", "massive list of strings....", "789"};

        var xdoc = new XDocument();
        xdoc.Declaration = new XDeclaration("1.0", Encoding.Unicode.WebName, "yes");
        var xroot = new XElement("records");
        xdoc.Add(xroot);

        foreach (string id in idlist)
        {
            // Get types FOO -----------------------------------
            Boolean keepGoingFOO = true;
            while (keepGoingFOO)
            {
                // 100 records max per call
                var w = new WebServiceClient();
                request.enumType = enumType.FOO;
                var response = w.response();
                foreach (ResultItem cr in response.ResultList)
                {
                    var xe = new XElement("r");
                    // create XML 
                    xroot.Add(xe);
                }               
                keepGoingFOO = response.moreRecordsExist;
            }

            // Get types BAR -----------------------------------
            Boolean keepGoingBAR = true;
            while (keepGoingBAR)
            {
                // 100 records max per call
                var w = new WebServiceClient();
                request.enumType = enumType.BAR;
                var response = w.response();
                foreach (ResultItem cr in response.ResultList)
                {
                    var xe = new XElement("r");
                    // create XML 
                    xroot.Add(xe);
                }               
                keepGoingBAR = response.moreRecordsExist;
            }                           
        }


        return View(xdoc);
    }
}
Community
  • 1
  • 1
EvilDr
  • 8,943
  • 14
  • 73
  • 133

3 Answers3

1

Should get you started:

public async ActionResult Index()
{
    var idlist = new List<string>() { "123", "massive list of strings....", "789" };
    IEnumerable<XElement> list = await ProcessList(idlist);
    //sort the list as it will be completely out of order
    return View(xdoc);
}

public async Task<IEnumerable<XElement>> ProcessList(IEnumerable<string> idlist)
{
    IEnumerable<XElement>[] processList = await Task.WhenAll(idlist.Select(FooBar));
    return processList.Select(x => x.ToList()).SelectMany(x => x);
}

private async Task<IEnumerable<XElement>> FooBar(string id)
{
    Task<IEnumerable<XElement>> foo = Foo(id);
    Task<IEnumerable<XElement>> bar = Bar(id);
    return ((await bar).Concat(await foo));
}

private async Task<IEnumerable<XElement>> Bar(string id)
{
    var localListOfElements = new List<XElement>();
    var keepGoingFoo = true;
    while (keepGoingFoo)
    {
        var response = await ServiceCallAsync(); //make sure you use the async version
        localListOfElements.Add(new XElement("r"));
        keepGoingFoo = response.moreRecordsExist;
    }
    return localListOfElements;
}

private async Task<IEnumerable<XElement>> Foo(string id)
{
    var localListOfElements = new List<XElement>();
    var keepGoingFoo = true;
    while (keepGoingFoo)
    {
        var response = await ServiceCallAsync(); //make sure you use the async version
        localListOfElements.Add(new XElement("r"));
        keepGoingFoo = response.moreRecordsExist;
    }
    return localListOfElements;
}

private async Task<Response> ServiceCallAsync()
{
    await Task.Delay(1000);//simulation
    return new Response();
}
weston
  • 54,145
  • 21
  • 145
  • 203
1

There are multiple issues with your code, it can be greatly improved by refactoring as the only items changing in between for loops is the request.EnumType. And performance could be greatly improved with proper use of async awaits - as long as the ids are independent the question is not parallelizing two - but rather parallelizing as much as possible.

The portion slowing your time is not xml access - it is web api calls.

I would refactor it as

async Task<Tuple<string, enumType, XElement>> SendRequest(string id, enumType input){
    ..
}

And replace the for loop with

List<Tuple<string, enumType>> tupleList = idList.Select(id => Tuple.Create(id, enumType.BAR)).ToList();

tupleList.Concat(idList.Select(id => Tuple.Create(id, enumType.FOO)).ToList());

Task<Tuple<string, enumType, XElement>>[] all = tupleList
    .Select(c => SendRequest(c.Item1, c.Item2))
    .ToArray();

var res = await Task.WhenAll(tasks);

The res variable will contain all the XElement values you want to add which should be fast. You can instead use a Key Value pair with Tuple of id-enumType being the key as well yet the idea is the same.

user3141326
  • 1,423
  • 2
  • 21
  • 31
1

To make the solution more elegant, I would hide the batching behind an enumerable object, like weston proposed, but instead of putting all items on one list - consume them as soon as they are available (to minimize memory utilization).

With AsyncEnumerator NuGet Package you can write the code like this:

public class DefaultController : Controller
{
    public async ActionResult Index()
    {
        var idlist = new List<String>() { "123", "massive list of strings....", "789" };

        var xdoc = new XDocument();
        xdoc.Declaration = new XDeclaration("1.0", Encoding.Unicode.WebName, "yes");
        var xroot = new XElement("records");
        xdoc.Add(xroot);

        foreach (string id in idlist) {

            // Get types FOO -----------------------------------
            var foos = EnumerateItems(enumType.FOO);

            await foos.ForEachAsync(cr => {
                var xe = new XElement("r");
                // create XML 
                xroot.Add(xe);
            });

            // Get types BAR -----------------------------------
            var bars = EnumerateItems(enumType.BAR);

            await foos.ForEachAsync(cr => {
                var xe = new XElement("r");
                // create XML 
                xroot.Add(xe);
            });
        }

        return View(xdoc);
    }

    public IAsyncEnumerable<ResultItem> EnumerateItems(enumType itemType)
    {
        return new AsyncEnumerable<ResultItem>(async yield => {

            Boolean keepGoing = true;
            while (keepGoing) {

                // 100 records max per call
                var w = new WebServiceClient();
                request.enumType = itemType;

                // MUST BE ASYNC CALL
                var response = await w.responseAsync();

                foreach (ResultItem cr in response.ResultList)
                    await yield.ReturnAsync(cr);

                keepGoing = response.moreRecordsExist;
            }
        });
    }
}

Note 1: by making everything async actually does not improve the performance of one single routine, and in fact makes it little bit slower (due to the overhead of async state machines and TPL tasks). However, it helps to better utilize the worker threads in your app overall.

Note 2: your client request must be Async, otherwise the optimization does not make sense - if you do it synchronously it blocks the thread and just waits when the server respond.

Note 3: please pass a CancellationToken to every Async method - that's a good practice.

Note 4: judging from the code, you have a web server, so I do not recommend to run all batches for all items in parallel and do Task.WhenAll to wait for them - if your service client is synchronous (the w.response call) then it will just block a lot of threads and your whole web service might get unresponsive.

In the proposed solution you can do one trick to do thing in parallel - your can read ahead the next batch while processing the current one like this:

    public IAsyncEnumerable<ResultItem> EnumerateItemsWithReadAhead(enumType itemType)
    {
        return new AsyncEnumerable<ResultItem>(async yield => {

            Task<Response> nextBatchTask = FetchNextBatch(itemType);
            Boolean keepGoing = true;

            while (keepGoing) {

                var response = await nextBatchTask;

                // Kick off the next batch request (read ahead)
                keepGoing = response.moreRecordsExist;
                if (keepGoing)
                    nextBatchTask = FetchNextBatch(itemType);

                foreach (ResultItem cr in response.ResultList)
                    await yield.ReturnAsync(cr);
            }
        });
    }

    private Task<Response> FetchNextBatch(enumType itemType)
    {
        // 100 records max per call
        var w = new WebServiceClient();
        request.enumType = itemType;

        // MUST BE ASYNC
        return w.responseAsync();
    }
Serge Semenov
  • 9,232
  • 3
  • 23
  • 24