0

I have gone through many Stackoverflow threads but I am still not sure why this is taking so long. I am using VS 2022 and it's a console app and the target is .NET 6.0. There is just one file Program.cs where the function and the call to the function is coded.

I am making a GET call to an external API. Since that API returns 10000+ rows, I am trying to call my method that calls this API, 2-3 times in Parallel. I also try to update this Concurrent dictionary object that is declared at the top, which I then use LINQ to show some summaries on the UI.

This same external GET call on Postman takes less than 30 seconds but my app takes for ever.

Here is my code. Right now this entire code is in Program.cs of a Console application. I plan to move the GetAPIData() method to a class library after this works.

static async Task GetAPIData(string url, int taskNumber)

{
    var client = new HttpClient();
    var serializer = new JsonSerializer();
    client.Timeout = TimeSpan.FromMilliseconds(Timeout.Infinite);
    var bearerToken = "xxxxxxxxxxxxxx";
    client.DefaultRequestHeaders.Add("Authorization", $"Bearer {bearerToken}");

    using (var stream = await client.GetStreamAsync(url).ConfigureAwait(false))

    using (var sr = new StreamReader(stream))

    using (JsonTextReader reader = new JsonTextReader(sr))
    {
        reader.SupportMultipleContent = true;
        while (reader.Read())
        {
            if (reader.TokenType == JsonToken.StartObject)
            {
                bag.Add(serializer.Deserialize<Stats?>(reader));

            }
        }

    }

}

My calling code:

var taskNumber=1;
var url = "https://extrenalsite/api/stream";
var bag = ConcurrentBag<Stats>();

var task1 = Task.Run(() => GetAPIData(url, bag, taskNumber++));

var task2 = Task.Run(() => GetAPIData(url, bag, taskNumber++));

var task3 = Task.Run(() => GetAPIData(url, bag, taskNumber++));

await Task.WhenAll(task1, task2, task3);

Please let me know why it is taking way too long to execute when I have spawned 3 threads and why it's slow.

Thanks.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • 2
    1) `HttpClient` is not meant to be instantiated multiple times. 2) all your three tasks do the same thing, right? This will not speed up things. 3) For comparison, how does your code perform with only one task? – Klaus Gütter Feb 05 '23 at 06:15
  • I have feeling, that you should use pagination, IMHO, getting so much data in row is not real case – Red Star Feb 05 '23 at 06:17
  • You tagged your question with ASP.NET and TPL but none of these technologies are used in your code. – Klaus Gütter Feb 05 '23 at 06:19
  • @KlausGütter, thanks for your response. I moved the http out of the method and I passed it to this method. Surprisingly, if I run it with just one Task, it is super fast. But If I include another task, it slows down. I don't understand why that is. – hillcountry99 Feb 05 '23 at 06:26
  • @KlausGütter, yes. Eventually I will be moving the calling code to asp.net core and will inject the HttpClient and all that. Yes, no asp.net right now:-). – hillcountry99 Feb 05 '23 at 06:27
  • As it runs fast with one task, either the API access itself slows down with multiple concurrent accesses, or it is an issue with the concurrent access to the `bag`. To solve the latter, change your method to return a `Task>` instead of directly writing to `bag` and merge the three lists after awaiting all task. – Klaus Gütter Feb 05 '23 at 06:32
  • @KlausGütter, its still the same issue after I changed it to return Task>. For one Task, it works good. 2 it runs forever. – hillcountry99 Feb 05 '23 at 06:52
  • Then maybe the server does not like to be called multiple times in parallel from the same client. – Klaus Gütter Feb 05 '23 at 06:53
  • That could be it too. I have been calling the PostMan thing just once. Let me see if I can spawn 2-3 calls using Postman. Thanks again @Klaus! – hillcountry99 Feb 05 '23 at 06:56
  • Not relevant to your main problem, but instead of `taskNumber++` please pass `1`, `2`, and `3`. Incrementing a shared `int` variable is not thread-safe. – Theodor Zoulias Feb 05 '23 at 06:57
  • Can you share precise measurements of the time it takes with 1, 2, and 3 tasks? "Super fast" and "takes forever" do not convey enough information. Also could you do the same by removing the `JsonTextReader` complexity, and just using the `HttpClient.GetStringAsync` to return the raw content? – Theodor Zoulias Feb 05 '23 at 07:03
  • Also please mention the platform and version that the Console application is running on (.NET Framework, .NET Core etc), and whether you are running it in debug or release mode, and with debugger attached or not. – Theodor Zoulias Feb 05 '23 at 07:06
  • @TheodorZoulias, wow. I did not know that. Replaced it with 1,2,3. When it is 2+ tasks, it is definitely more than 15 minutes and I had to stop the app. I didn't have the patience to wait. But I can let it run. Super fast is around 20 seconds.. Trying GetStringAsync now – hillcountry99 Feb 05 '23 at 07:08
  • @TheodorZoulias, I am running VS 2022 with target of .NET 6. Its a Console app and yes, I am running it in debug mode but I don't have any debug points set, as I kind of know it may do something with the threads. Let me run it without debugging as well. – hillcountry99 Feb 05 '23 at 07:11
  • 1
    We collected some important additional information now in the comments where they are not easily visible. Consider editing your question to put this information there. – Klaus Gütter Feb 05 '23 at 07:13
  • 1
    Yep, the `taskNumber++` is running on the `ThreadPool`, so in parallel. In case you refactor your code to use a `for` loop, be aware also about this issue: [Captured variable in a loop](https://stackoverflow.com/questions/271440/captured-variable-in-a-loop-in-c-sharp). Also please share measurements with the program running without the debugger attached (started with Ctrl+F5). – Theodor Zoulias Feb 05 '23 at 07:13
  • You could also try running in parallel two instances of your program, with each instance running a single task, and report the behavior. – Theodor Zoulias Feb 05 '23 at 07:19
  • @TheodorZoulias, The GetStringAsync() just freezes because the data is too large. So, I have to use GetStreamAsync(); – hillcountry99 Feb 05 '23 at 07:24
  • I apologize. I included all the details in the main post now. – hillcountry99 Feb 05 '23 at 07:27
  • You mean `GetStringAsync()` running on a single task? It sounds strange that your machine has enough memory to hold all the deserialized `Stats` objects, but not enough for the raw data. Do you have any idea how long is this string? – Theodor Zoulias Feb 05 '23 at 07:30
  • 1
    Something else you could try is instead of putting the `Stats` in a list, just discard them and only return their total number. I am starting to think that your problem is related with your machine running out of memory. – Theodor Zoulias Feb 05 '23 at 07:33
  • I am now having a problem with even one task now. I just checked. CPU is pegged at 2% and memory is absolutely normal. My laptop has 8 cores as well. It just beats me. @TheodorZoulias, yes. It is on a single task. – hillcountry99 Feb 05 '23 at 07:42
  • @KlausGütter btw I think that the umbrella term [Task Parallel Library](https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/task-parallel-library-tpl) (TPL) includes the `Task.Run` method. So the [task-parallel-library] tag is somewhat relevant. – Theodor Zoulias Feb 05 '23 at 07:45
  • 1
    Maybe you exceeded some quota on the API server now? – Klaus Gütter Feb 05 '23 at 07:45
  • It shows that error actually when that happens. – hillcountry99 Feb 05 '23 at 07:51
  • 1
    I am not sure if I can write these kinds of comments on stackoverflow but I don't want to be rude if someone replies by not replying back. I think I will take this back up again tomorrow. Thanks again! – hillcountry99 Feb 05 '23 at 07:52
  • Could you clarify what you are comparing? Because comparing 1 postman call to multiple C# calls + synchronization + linq + UI framework is not comparable. – Selmir Aljic Feb 05 '23 at 08:02
  • 2
    I suggest you use some [stopwatch](https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.stopwatch?view=net-7.0)es and measure where the latency is. Or use Fiddler and check the stats. It could very well be that the bottleneck is somewhere else, e.g. the service could be rate limiting you, you could be overwhelming the service, you could be throttled by a WAF because they thing you're attempting a DOS, you might have crappy wifi. I am not sure the problem is your code. – John Wu Feb 05 '23 at 09:33
  • I eventually created my own API that calls the external API and in my API, I used StartTime and Endtime logic to make the API call. Basically, I used a while starttime < endtime ( 2minutes) loop I keep calling the external api and keep collecting data. Once the end time is passed, I return whatever data I have collected. On the front end, I make a call to my API using a stopwatch and I fire 2-3 threads concurrently every 3 minutes. Even though it is slow in terms of display of data, it works continuously and I can see the data on the screen after every 2 minutes and the UI is still responsive. – hillcountry99 Feb 06 '23 at 16:47

1 Answers1

0

If it is not necessary to immediately write to the collection, you can improve performance by collecting the results locally in the task method. This does not need to do thread synchronisation and will therefore be faster:

static async Task<List<Stats>> GetAPIData(string url, int taskNumber)
{
    var result = new List<Stats>();
    // your code, but instead of writing to `bag`:
        list.Add(serializer.Deserialize<Stats>(reader));
    // ...
    return list;
}

var task1 = Task.Run(() => GetAPIData(url, taskNumber++));
var task2 = Task.Run(() => GetAPIData(url, taskNumber++));
var task3 = Task.Run(() => GetAPIData(url, taskNumber++));

await Task.WhenAll(task1, task2, task3);

var allResults = task1.Result.Concat(task2.Result).Concat(task3.Result);
Klaus Gütter
  • 11,151
  • 6
  • 31
  • 36