2

I'm trying to generate a zip file of pdfs asynchronously to speed things up as follows:

var files = new Dictionary<string, byte[]>();
var fileTasks = new List<Task<Library.Models.Helpers.File>>();

foreach (var i in groups)
{
    var task = Task.Run(async () =>
    {
        var fileName = $"{i.Key.Title.Replace('/', '-')} - Records.pdf";
        ViewBag.GroupName= i.Key.Title;
        var html = await this.RenderViewAsync("~/Views/Report/_UserRecordsReport.cshtml", i.ToList(), true);

        return await _fileUtilityService.HtmlToPDF2(html, null, fileName);
    });
    fileTasks.Add(task);
}

var completedTaskFiles = await Task.WhenAll(fileTasks);
foreach(var item in completedTaskFiles)
{
    files.Add($"{item.FileName}", item.FileResult);
}

return _fileUtilityService.GenerateZIP(files);

I'm generating all my html to pdf file tasks and waiting for them to be completed - then trying to synchronously loop through the completed tasks and add them to my dictionary for zipping but I keep getting the following error:

An item with the same key has already been added

There is no duplicate key in the list of items being added.

EDIT - so the current idea is that because its a scoped service, thats why i'm running into thread issues (attached the file utility service for information)

public class FileUtilityService : IFileUtilityService
    {
        private readonly IHttpClientFactory _clientFactory;
        public FileUtilityService(IHttpClientFactory clientFactory)
        {
        

public async Task<byte[]> HtmlToPDF(string html = null, string url = null)
        {
            try
            {
                byte[] res = null;
                if (html is null && url != null)
                {
                    var client = _clientFactory.CreateClient();
                    var requestResp = await client.GetAsync(url);
                    using var sr = new StreamReader(await requestResp.Content.ReadAsStreamAsync());
                    html = HttpUtility.HtmlDecode(await sr.ReadToEndAsync());
                }

                using(var ms = new MemoryStream())
                {
                    HtmlConverter.ConvertToPdf(html, ms);
                    res =  ms.ToArray();
                }
                return res;
                
            }catch(Exception ex)
            {
                throw ex;
            }
        }

        public async Task<Library.Models.Helpers.File> HtmlToPDF(string html = null, string url = null, string fileName = "")
        {
            return new Library.Models.Helpers.File() { FileName = fileName, FileResult = await HtmlToPDF(html, url) };
        }
  • In order to determine which file has been added multiple times please use the following LINQ query: `completedTaskFiles.GroupBy(item => item.FileName).Where(@group => @group.Count() > 1).Select(@group => @group.Key);` – Peter Csala Mar 11 '21 at 08:20
  • This looks like a variable capture problem to me. Inside your `foreach` create a local variable `var group = i;` and then use that `group` inside your `Task` logic. This is a common occurrence with multithreading. You reference `i` inside your logic, but by the time that logic actually executes (will be on another thread) `i` could very well already have a different value. –  Mar 11 '21 at 08:48
  • 1
    @Knoop the foreach loop is not affected by the [captured variable in a loop](https://stackoverflow.com/questions/271440/captured-variable-in-a-loop-in-c-sharp) problem. At least not after it was fixed in C# 5. You can read [this](https://ericlippert.com/2009/11/12/closing-over-the-loop-variable-considered-harmful-part-one/) Eric Lippert's blog post for the details. – Theodor Zoulias Mar 11 '21 at 09:12
  • 1
    @TheodorZoulias Interesting, did not know that, thanks for the clarification! –  Mar 11 '21 at 09:19
  • Are you sure that the `_fileUtilityService` can be used safely by multiple threads concurrently? If the class in not documented to be thread-safe, you should probably assume that it isn't. I would suggest to create a dedicated instance of the class for each group. – Theodor Zoulias Mar 11 '21 at 09:19
  • There is information missing. You claim the error happens before `Task.WhenAll` returns. But the example code does not use any dictionary directly, so either the information is wrong, or there is some dictionary or similar used in one of the called methods/properties. Debugging should reveal the actual problem. – JonasH Mar 11 '21 at 10:50
  • @JonasH Seems like i was incorrect about the error happening before the Task.WhenAll – Stuart Lexington Mar 11 '21 at 12:11
  • 2
    I'd be concerned with ViewBag.GroupName potentially being modified by another task. Is ViewBag an Async local? – pinkfloydx33 Mar 11 '21 at 12:25
  • @TheodorZoulias the problem seems to dissapear if i move the htmlToPdf function to inside the controller and inject the IHttpClientFactory there instead. - Any input on a neater alternative? Perhaps a static function that receives a client instance for the request. – Stuart Lexington Mar 11 '21 at 12:27
  • After seeing the `FileUtilityService` implementation, I am now concerned about the thread-safety of the method `HtmlConverter.ConvertToPdf`. Regarding suggestions for alternative designs, the dependency injection concept in not my forte, so I can't suggest anything. – Theodor Zoulias Mar 11 '21 at 13:02

0 Answers0