1

I am using C# and ASP.NET Core 6 MVC.

I have a requirement to fetch all results from API using offset whether it is just 64 records or 6300 records. Adjust the offset and do a concurrency call or parallel call to get all records at once. I need to do in the best way.

I am calling an API which results 100 max record per call. Although the overall total result (totalResult) can be around 65, 120, 1500 or 2520, or 6534 etc. There is an offset integer which I can pass into the API to get the further 100 results each time. By default, it is zero, which can brings 100 max records.

For example for totalResult of 65, the offset 0 is sufficient as it will bring all 65 records. For totalResult of 150, the offset 0 will bring 100 records and then for the next iteration, offset has to be 100 to bring more. And likewise for 6530 max records, the offset has to be adjusted 100, 200, 300... to get all results.

Now, I need to run this task parallel to avoid delay time.

This is my function:

var offset = 0

// My async call method
var addressResult = await _postcode.GetAddresses(strPostcode, offset);

if (addressResult?.Results != null && addressResult.Results.Any())
{
        // concurrency code to run here with offset
        int total = addressResult.Header.TotalResults; //Total Result e,g 6500
        var thePostcoderesult = addressResult.Results;

        // max result could be any number depends on the Total Result if it is 
        int maxresult =  thePostcoderesult.Count(); 
}

So in the end when all concurrency calls to API finishes, thePostcoderesult should have all results added to it.

var thePostcoderesult = addressResult.Results;

Now, I am aware we can achieve this through

await Parallel.ForEachAsync(offsets, options, async (offset, ct) =>

with the help of the post ticked answer How to make multiple API calls faster?

I tried implementing that logic - but it gives me result only up to 1000 results as something to do with Offset and Parallel loop is not aligned. As tasks are running 10 times only and it gives 1000 results - although the results with the postcode I am searching is 1630.

Here is my updated code but as I mentioned, it does not wait to finish or run until the total number of offset.

var offset = 0
var addressResult = await _postcode.GetAddresses(strPostcode, offset);

if (addressResult?.Results != null && addressResult.Results.Any())
{
    int total = addressResult.Header.TotalResults;

    // Setting offset here - but something is not right
    IEnumerable<int> offsets = Enumerable
            .Range(0, total)
            .Select(n => checked(n * 100))
            .TakeWhile(offset => offset < Volatile.Read(ref total));

    // wanted to use 10 parallel threads which is a safe bet I believe
    var options = new ParallelOptions() { MaxDegreeOfParallelism = 10 };
 
    var thePostcoderesult = new List<AddressResult>();
    await Parallel.ForEachAsync(offsets, options, async (offset, ct) =>
        {
            var addressResult = await _postcode.GetAddresses(strPostcode, offset);

            if (offset == 0)
            {   //I am not using it
                //Volatile.Write(ref total, Jresult.Results.Count());
            }
            thePostcoderesult.AddRange(addressResult.Results);
        });

    return thePostcoderesult;
}

Apologies in advance for the detailed post - If you can help to do this more correct or neat way, please you are welcome

Many thanks

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
matrixnew
  • 35
  • 4
  • Lists aren't thread-safe. Use a `ConcurrentBag` instead. I expect [this](https://stackoverflow.com/questions/5874317/thread-safe-listt-property) answers your question. – ProgrammingLlama Nov 29 '22 at 03:00
  • Does every request, regardless of offset return a field with a number telling you the total number of records? – AceGambit Nov 29 '22 at 03:06
  • @AceGambit I see where you're going. Looking at OP's code, it does actually look like they have an initial call to the API to get the total, and then they build the rest of the requests from that. It does indeed seem like they could just pre-instantiate a collection and fill in the correct positions for each call. – ProgrammingLlama Nov 29 '22 at 03:09
  • 1
    Yeah, if you know how many you're gonna have, you can just create a list of integers which are your offsets and a plain old array. Filling `offset` to `offset + 99` for each offset and then just parallelize a loop around the offsets. Or if you wanna get fancy iterate over the offsets and kick off Tasks, and use Task.WaitAll() at the end (my preferred version of parallelism). – AceGambit Nov 29 '22 at 03:15
  • 2
    @AceGambit You should add an answer to that effect :) – ProgrammingLlama Nov 29 '22 at 03:17
  • In case you are interested for a `Parallel.ForEachAsync` variant that returns results, you could look at [this](https://stackoverflow.com/questions/30907650/foreachasync-with-result "ForEachAsync with Result") question. Are you targeting .NET 6 or later? – Theodor Zoulias Nov 29 '22 at 04:50
  • 1
    @ProgrammingLlama [my opinion](https://stackoverflow.com/questions/15400133/when-to-use-blockingcollection-and-when-concurrentbag-instead-of-listt/64823123#64823123) about the `ConcurrentBag` iznogoud! – Theodor Zoulias Nov 29 '22 at 05:00
  • @AceGambit - Yes, everytime I request, the api tells me the TotalRecord and MaxResult – matrixnew Nov 29 '22 at 09:28
  • Question to all - does that mean - I need to create as many tasks as the offsets are. In this case my offsets can result 100 records - For records having 6500 - I need to create 65 Tasks? If yes how @AceGambit – matrixnew Nov 29 '22 at 09:46
  • 1
    @matrixnew In my answer, there's a `for` loop populating the `tasks` array with the results of `Task.Run`. Calling `Task.Run` creates a new task. – AceGambit Nov 30 '22 at 01:09

1 Answers1

1

You got a lot going on there, I don't think it needs to be quite that complicated. Since it seems the initial GetAddresses call tells you how many records you're going to have, you can do something like this:

var initialResponse = await _postcode.GetAddresses(strPostcode, 0);

if (initialResponse?.Results == null || !initialResponse.Results.Any())
{
  return;
}

var totalPostCodeResults = new AddressResult[initialResponse.Header.TotalResults];

// fill up to the first 100 since you have it and bail if that's all there is
FillItems(initialResponse.Results, totalPostCodeResults, 0);

if(totalPostCodeResults.Length <= 100)
  return totalPostCodeResults;

// Fill the offsets (aka start indexes) starting at 100
var offsets = new List<int>();
var offset = 100;
while(offset < totalPostCodeResults.Length)
{
  offsets.Add(offset);
  offset+=100;
}

// TODO: add the last one using modulus

// Kick off a task for each offset range
var tasks = new Task[offsets.Count()];
for(int i = 0; i < tasks.Length; i++)
{
  // copy i to scoped variable to avoid parallel messiness
  var index = i;
  tasks[index] = Task.Run(async () => {
    var response = await _postcode.GetAddresses(strPostcode, offsets[index]);
    FillItems(response.Results, totalPostCodeResults, offsets[index]);
  }
}

// Wait for all of them to finish
Task.WaitAll(tasks);

return totalPostCodeResults

void FillItems(List<AddressResult> results, AddressResult[] totalArray, int startIndex)
{
  var index = startIndex;
  results.ForEach(item => totalArray[index++] = item);
}
AceGambit
  • 423
  • 3
  • 11
  • Let me try this - Does it mean we need to open exact amount of parallel tasks as the offsets number are – matrixnew Nov 29 '22 at 09:42
  • little problem on this line results.ForEach(item => totalArray[index++] = item); The results for example if 1688 - the index increments to 1689 and thus give an exception out of range – matrixnew Nov 29 '22 at 10:54
  • 1
    @matrixnew, I don't think I follow. if index = 5, then totalArray[index++] = whatever will set totalArray[5] = to whatever and THEN increment index to 6; – AceGambit Nov 30 '22 at 01:07