0

Hello everyone and thanks for helping me in advance. The following question might sound stupid and incorrect but I'm a beginner about it.

I have a method that gets some information from my database and sends it to an external database using a post call and a patch call in case the information has changed. I use EF Framework. In that db table there are at least 165k rows.

My question is the following: There is a way to optimize and speed up all the process? Maybe using multi threading, parallelism? I'm a beginner about it and I hope some of you help me understand.

The method is the following:

public async Task<List<dynamic>> SyncOrdersTaskAsync(int PageSize)
{
    int PageIndex = 0;

    if (PageSize <= 0) PageSize = 100;

    const string phrase = "The fields order, task_code must make a unique set";

    var sorting = new SortingCriteria {
        Properties = new string[] { "WkOpenDate ASC" } };

    List<dynamic> listTest = new List<dynamic>();

    using (var uow = this.Factory.BeginUnitOfWork())
    {
        var repo = uow.GetRepository<IWorkOrderRepository>();

        var count = await repo.CountAllAsync();

        count = 150;

        for (PageIndex = 0; PageIndex <= count / PageSize; PageIndex++)
        {
            var paging = new PagingCriteria
            {
                PageIndex = PageIndex,
                PageSize = PageSize
            };

            var rows = await repo.GetByCriteriaAsync(
                "new {WkID, CompanyID, JobNo, JobTaskNo ,WkNumber, WkYear," +
                "WkYard,WkCustomerID,CuName,WkDivisionID,DvName,BusinessUnit," +
                "BusinessUnitManagerID,BusinessUnitManager,WkWorkTypeID,WtName," +
                "WkActivityID,WkActivityDescription,NoteDescrLavoro,WkWOManagerID," +
                "ProjectManager,IDMaster,ProjectCoordinator,WkOpenDate," +
                "WkDataChiusa,Prov,CodiceSito,CodiceOffice,CodiceLavorazione," +
                "CodiceNodo,DescrizioneNodo,WkPrevisionalStartDate,WkRealStartDate," +
                "WkPrevisionalEndDate,WkRealEndDate,NumeroOrdine," +
                "WkPrevisionalLabourAmount,TotaleCosti,SumOvertimeHours," +
                "SumTravelHours,SumNormalHours,WkProgressPercentage,Stato,CUP,CIG," +
                "TotaleManodopera,TotalePrestazioni,TotaleNoli,TotaleMateriali," +
                "SumAuxiliaryHours,TipoCommessa,TotaleOrdine, WkPreventivoData," +
                "WkConsuntivoData,TotaleFatturato,AggregateTotaleFatturato," +
                "AggregateTotalePrestazioni,Contract,CustomerOrderNumber," +
                "XmeWBECode,LastUpdateDate,PreGestWkID,CommercialNotes,Mandant," +
                "GammaProjectName,WkInventoryDate,WkCloseFlag,WkNote," +
                "TotalRegisteredLabour,TotalRegisteredPerformances," +
                "TotalRegisteredLeasings,TotalRegisteredMaterials,FlagFinalBalance," +
                "FinalBalance,OrderDate,TotalOrderDivision,SearchDescription," +
                "TotaleBefToBeApproved,TotaleBefToBeApprovedLeasings," +
                "TotaleLabourToBeApproved,AggregateLevel, AggregateTotalLabour," +
                "AggregateTotalLeasings,AggregateTotalMaterials," +
                "AggregateTotalRegisteredLabour," +
                "AggregateTotalRegisteredPerformances," +
                "AggregateTotalRegisteredLeasings," +
                "AggregateTotalRegisteredMaterials," +
                "AggregateTotalCost,AggregateSumNormalHours," +
                "AggregateSumAuxiliaryHours,AggregateSumRainHours," +
                "AggregateSumTravelHours,AggregateSumOvertimeHours," +
                "AggregateWkPrevisionalLabourAmount,AggregateFinalBalance," +
                "AggregateTotalOrder,AggregateTotalOrderDivision," +
                "AggregateTotalBefToBeApproved," +
                "AggregateTotalBefToBeApprovedLeasings," +
                "AggregateTotalLabourToBeApproved,TotalProduction," +
                "AggregateTotalProduction,JobTaskDescription}", paging, sorting);

            String url = appSettings.Value.UrlV1 + "order_tasks/";

            using (var httpClient = new HttpClient())
            {
                httpClient.DefaultRequestHeaders.Add("Authorization", "Token " +
                    await this.GetApiKey(true));
                if (rows.Count() > 0)
                {
                    foreach (var row in rows)
                    {
                        var testWork = (Model.WorkOrderCompleteInfo)Mapper
                            .MapWkOrdersCompleteInfo(row);
                        var orderIdDiv = await this.GetOrderForSyncing(httpClient,
                            testWork.JobNo);
                        var jsonTest = new JObject();
                        jsonTest["task_code"] = testWork.JobTaskNo;
                        jsonTest["description"] = testWork.JobTaskDescription;
                        jsonTest["order"] = orderIdDivitel.Id;
                        jsonTest["order_date"] = testWork.OrderDate.HasValue
                            ? testWork.OrderDate.Value.ToString("yyyy-MM-dd")
                            : string.IsNullOrEmpty(testWork.OrderDate.ToString())
                                ? "1970-01-01"
                                : testWork.OrderDate.ToString().Substring(0, 10);
                        jsonTest["progress"] = testWork.WkProgressPercentage;

                        var content = new StringContent(jsonTest.ToString(),
                            Encoding.UTF8, "application/json");
                        var result = await httpClient.PostAsync(url, content);
                        if (result.Content != null)
                        {
                            var responseContent = await result.Content
                                .ReadAsStringAsync();
                            bool alreadyExists = phrase.All(responseContent.Contains);

                            if (alreadyExists)
                            {
                                var taskCase = await GetTaskForSyncing(httpClient,
                                    testWork.JobTaskNo, orderIdDiv.Id.ToString());
                                var idCase = taskCase.Id;
                                String urlPatch = appSettings.Value.UrlV1 +
                                    "order_tasks/" + idCase + "/";
                                bool isSame = taskCase.Equals(testWork
                                    .toSolOrderTask());
                                if (!isSame)
                                {
                                    var resultPatch = await httpClient.PatchAsync(
                                        urlPatch, content);
                                    if (resultPatch != null)
                                    {
                                        var responsePatchContent = await resultPatch
                                            .Content.ReadAsStringAsync();
                                        var jsonPatchContent = JsonConvert
                                            .DeserializeObject<dynamic>(
                                            responsePatchContent);
                                        listTest.Add(jsonPatchContent);
                                    }
                                }
                                else
                                {
                                    listTest.Add(taskCase.JobTaskNo_ +
                                        " is already updated!");
                                }
                            }
                            else
                            {
                                var jsonContent = JsonConvert
                                    .DeserializeObject<dynamic>(responseContent);
                                listTest.Add(jsonContent);
                            }
                        }
                    }
                }
            }
        }

        return listTest;
    }
}

Maybe I need to apply parallelism in the for loop?

Again, really thanks to everyone in advance and I hope I was clear :)

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
Frank DG
  • 25
  • 5
  • Sounds like you need to rethink how this API works. You need to be able to batch upload a lot of data at once, which would speed things up significantly. – Charlieface Jul 07 '22 at 10:32
  • As a side note, the `HttpClient` class is intended to be instantiated [once](https://learn.microsoft.com/en-us/aspnet/web-api/overview/advanced/calling-a-web-api-from-a-net-client#create-and-initialize-httpclient), and reused throughout the life of an application. – Theodor Zoulias Jul 07 '22 at 11:06
  • Regarding the `asp.net-mvc` tag, is your intention to run this code as part of creating the response for a request in a web application? Also are you targeting the .NET 6 runtime, or an earlier .NET platform? – Theodor Zoulias Jul 07 '22 at 11:09
  • @TheodorZoulias not at the moment but in the future it is possible, so I don't exclude this possibility. And I'm using .net6 – Frank DG Jul 07 '22 at 12:45
  • @Charlieface I'm a beginner in the matter of parallelizing processes – Frank DG Jul 07 '22 at 12:45
  • Like I said, you need to rethink why you want to parallelize this in the first place. It's going to be much faster if instead you convert the API to accept a bulk upload, which the server can process as a batch. It's the difference between if you had to move a lot of stones: hiring 100 people to carry one stone each at the same time, or sending one person with a big bucket load of stones. – Charlieface Jul 07 '22 at 13:07
  • As a side note, the line `if (rows.Count() > 0)` probably causes your `repo.GetByCriteriaAsync` query to be executed twice. Most likely this query in not materialized. It is a deferred enumerable, and it is executed again and again against your database each time it is enumerated one way or another. And the `Count()` is one way to enumerate it. You don't need this line. In case the Count() is zero, the following `foreach (var row in rows)` will just do zero iterations. Nothing bad it's going to happen. – Theodor Zoulias Jul 07 '22 at 13:10

1 Answers1

1

The most handy tool that is currently available for parallelizing asynchronous work is the Parallel.ForEachAsync method. It was introduced in .NET 6. Your code is quite complex though, and deciding where to put this loop is not obvious.

Ideally you would like to call the Parallel.ForEachAsync only once, so that it parallelizes your work with a single configurable degree of parallelism from start to finish. Generally you don't want to put this method inside an outer for/foreach loop, because then the degree of parallelism will fluctuate during the whole operation. But since your code is complex, I would go the easy way and do just that. I would replace this code:

foreach (var row in rows)
{
    //...
}

...with this:

ParallelOptions options = new() { MaxDegreeOfParallelism = 2 };
await Parallel.ForEachAsync(rows, options, async (row, _) =>
{
    //...
});

You have to make one more change. The List<T> is not thread safe, and so it will get corrupted if you call Add from multiple threads without synchronization. You can either add a lock (listTest) before each listTest.Add, or replace it with a concurrent collection. My suggestion is to do the later:

ConcurrentQueue<dynamic> listTest = new();
//...
listTest.Enqueue(jsonContent);
//...
return listTest.ToList();

After doing these changes, hopefully your code will still work correctly, and it will be running a bit faster. Then you'll have to experiment with the MaxDegreeOfParallelism setting, until you find the one that yields the optimal performance. Don't go crazy with large values like 100 or 1000. In most cases overparallelizing is harmful, and might yield worse performance than not parallelizing at all.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
  • It seems to work but sometimes appears this message: "detail: request was throttled. expected available in 1 second" – Frank DG Jul 08 '22 at 09:00
  • @FrankDG the target server might dislike being bombarded with too many requests. Do you have any documentation about how many requests are allowed every X seconds? If you don't, you could consider incorporating a rate-limiter in your code (like [this](https://stackoverflow.com/questions/65825673/partition-how-to-add-a-wait-after-every-partition/65829971#65829971 "Partition: How to add a wait after every partition") one) and then try to find the server's rate-limiting policy through experimentation. – Theodor Zoulias Jul 08 '22 at 09:18
  • 1
    Honestly I don't have any documentation about how many requests are allowed. Anyway thank you for helping me :) If I encounter other problems I will add a comment here @TheodorZoulias – Frank DG Jul 08 '22 at 09:43