2

I have a API Post method that takes is a string which represents a Bae64 string of bytes from a word document that the API converts to PDF. My test client sends multiple documents, each on its own task, to the API to be converted. The problem is with concurrency and writing the files. I end up with a file in use since the calls are parallel. I have tried a lot of different way to block the conversion process until a document is converted but none of it has worked. Everything works fine if it's jsut a single file being converted but as soon as it's 2 or more, the problem happens. Can anyone guide me in the correct direction to solve this issue?

API:

        [HttpPost]
        public async Task<SimpleResponse> Post([FromBody]string request)
        {
            var response = new SimpleResponse();
            Task t = Task.Factory.StartNew(async () =>
            {
                try
                {
                    Converter convert = new Converter();

                    var result = await convert.CovertDocToPDF(request, WebConfigurationManager.AppSettings["tempDocPath"], WebConfigurationManager.AppSettings["tempPdfPath"]);

                    response.Result = result;
                    response.Success = true;

                }
                catch (Exception ex)
                {
                    response.Exception = ex;
                    response.Success = false;
                    response.Errors = new List<string>();
                    response.Errors.Add(string.Format("{0}, {1}", ex.Message, ex.InnerException?.Message ?? ""));
                }
            });
            t.Wait();
            return response;
        }

Conversion code

        public Task<string> CovertDocToPDF(string blob, string tempDocPath, string tempPdfPath)
        {
            try
            {
                // Convert blob back to bytes
                byte[] bte = Convert.FromBase64String(blob);

                // Process and return blob
                return Process(bte, tempDocPath, tempPdfPath);
            }
            catch (Exception Ex)
            {
                throw Ex;
            }            
        }

        private async Task<string> Process(byte[] bytes, string tempDocPath, string tempPdfPath)
        {
            try
            {
                string rs = RandomString(16, true);
                tempDocPath = tempDocPath + rs + ".docx";
                tempPdfPath = tempPdfPath + rs + ".pdf";

                // This is where the problem happens with concurrent calls. I added
                // the try catch when the file is in use to generate a new
                // filename but the error still happens.
                try
                {
                    // Create a temp file
                    File.WriteAllBytes(tempDocPath, bytes);
                }
                catch (Exception Ex)
                {
                    rs = RandomString(16, true);
                    tempDocPath = tempDocPath + rs + ".docx";
                    tempPdfPath = tempPdfPath + rs + ".pdf";
                    File.WriteAllBytes(tempDocPath, bytes);
                }

                word.Application app = new word.Application();
                word.Document doc = app.Documents.Open(tempDocPath);
                doc.SaveAs2(tempPdfPath, word.WdSaveFormat.wdFormatPDF);

                doc.Close();
                app.Quit(); // Clean up the word instance.

                // Need the bytes to return the blob
                byte[] pdfFileBytes = File.ReadAllBytes(tempPdfPath);

                // Delete temp files
                File.Delete(tempDocPath);
                File.Delete(tempPdfPath);

                // return blob
                return Convert.ToBase64String(pdfFileBytes);
            }
            catch (Exception Ex)
            {
                throw Ex;
            }
        }

Client:

        public async void btnConvert_Click(object sender, EventArgs e)
        {
            var response = await StartConvert();            

            foreach (SimpleResponse sr in response)
            {
                if (sr.Success)
                {
                    byte[] bte = Convert.FromBase64String(sr.Result.ToString());

                    string rs = RandomString(16, true);
                    string pdfFileName = tempPdfPath + rs + ".pdf";

                    if (File.Exists(pdfFileName))
                    {
                        File.Delete(pdfFileName);
                    }

                    System.IO.File.WriteAllBytes(pdfFileName, bte);
                }
                else
                {
                }
            }
        }

        private async Task<IEnumerable<SimpleResponse>> StartConvert()
        {
            var tasks = new List<Task<SimpleResponse>>();

            foreach (string s in docPaths)
            {
                byte[] bte = File.ReadAllBytes(s);

                tasks.Add(ConvertDocuments(Convert.ToBase64String(bte)));
            }

            return (await Task.WhenAll(tasks));
        }


        private async Task<SimpleResponse> ConvertDocuments(string requests)
        {
            using (var client = new HttpClient(new HttpClientHandler() { UseDefaultCredentials = true }))
            {              
                client.BaseAddress = new Uri(BaseApiUrl);
                client.DefaultRequestHeaders.Add("Accept", "application/json");

                // Add an Accept header for JSON format.    
                client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));//application/json
                HttpRequestMessage request = new HttpRequestMessage(HttpMethod.Post, BaseApiUrl + ApiUrl);

                var data = JsonConvert.SerializeObject(requests);

                request.Content = new StringContent(data, Encoding.UTF8, "application/json");

                HttpResponseMessage response1 = await client.PostAsync(BaseApiUrl + ApiUrl, request.Content).ConfigureAwait(false);
                var response = JsonConvert.DeserializeObject<SimpleResponse>(await response1.Content.ReadAsStringAsync());

               
                return response;
            }            
        }

Random String Generator

        public string RandomString(int size, bool lowerCase = false)
        {
            var builder = new StringBuilder(size);

            // Unicode/ASCII Letters are divided into two blocks
            // (Letters 65–90 / 97–122):   
            // The first group containing the uppercase letters and
            // the second group containing the lowercase.  

            // char is a single Unicode character  
            char offset = lowerCase ? 'a' : 'A';
            const int lettersOffset = 26; // A...Z or a..z: length = 26  

            for (var i = 0; i < size; i++)
            {
                var @char = (char)_random.Next(offset, offset + lettersOffset);
                builder.Append(@char);
            }

            return lowerCase ? builder.ToString().ToLower() : builder.ToString();
        }

Michael Brown
  • 97
  • 1
  • 10
  • 1
    Can you please share code for `RandomString`? Maybe it generates same result for concurrent requests? Try using `Guid.NewGuid` instead of call to `RandomString` – Guru Stron Jan 15 '21 at 19:41
  • Please see original post. I added that function – Michael Brown Jan 15 '21 at 19:51
  • 1
    How `_random` is created? Is it shared between processes or it is created for each request? And again - try changing all calls to it with `Guid.NewGuid`. – Guru Stron Jan 15 '21 at 19:54
  • @GuruStron Let me run some tests on that and get back to you – Michael Brown Jan 15 '21 at 19:57
  • @GuruStron - So if I send 5 files for example, and I set a breakpoint on the 1st line in RandomString. That breakpoint gets hit 5 times before going to the next line. Each line in code gets hit based on the number of files I sent. That is obviously a problem and I do not know how to resolve it – Michael Brown Jan 15 '21 at 20:08
  • 2
    Yes and it does make sense) also setting breakpoints can change the behavior of the system. Try logging generated file name to console or somewhere else. `Random` class is not thread safe AFAIK so you possibly need to lock `Next` calls, as for creating instance of it per request - you have a chance of creating ones with the [same seed](https://stackoverflow.com/a/3049478/2501279). – Guru Stron Jan 15 '21 at 20:15
  • 1
    Try using `Guid.NewGuid` to generate random file name or `Path.GetTempFileName` – Guru Stron Jan 15 '21 at 20:16
  • So should I try to generate a GUID and write to the console to see the results of each request? – Michael Brown Jan 15 '21 at 20:16
  • 1
    I would say that you can just generate the guids =) – Guru Stron Jan 15 '21 at 20:18
  • @GuruStron - That did it. Thank you! – Michael Brown Jan 15 '21 at 20:33
  • Will add as an answer then. – Guru Stron Jan 15 '21 at 21:15

2 Answers2

4

First, get rid of Task.Factory.StartNew ... t.Wait() - you don't need an additional task, the root level method is async and your blocking Wait just spoils the benefits of async by blocking synchronously.

Second, like a comment suggested above, the file name random string generator is most likely to be not really random. Either do not supply anything to the seed value of your pseudo-random gen, or use something like Environment.TickCount which should be sufficient for this. Guid.NewGuid() will work too.

Another good option for temp files is Path.GetTempFileName (also generates an empty file for you): https://learn.microsoft.com/en-us/dotnet/api/system.io.path.gettempfilename?view=netstandard-2.0

[HttpPost]
public async Task<SimpleResponse> Post([FromBody]string request)
{
    var response = new SimpleResponse();
    try
    {
        ...

        var result = await convert.CovertDocToPDF(...);

        ...
    }
    catch (Exception ex)
    {
        ...
    }
    
    return response;
}
Roman Polunin
  • 53
  • 3
  • 12
  • The Task.Factory.StartNew was left over code I was trying. It has been removed. I will try a GUID instead and see if that will fix it. – Michael Brown Jan 15 '21 at 19:51
  • Here's your friend here: https://learn.microsoft.com/en-us/dotnet/api/system.io.path.gettempfilename?view=netstandard-2.0 – Roman Polunin Jan 15 '21 at 19:55
1

Based on your code it seems that you have a "faulty" random string generator for file name (I would say _random.Next is a suspect, possibly some locking and/or "app wide" instance could fix the issue). You can use Guid.NewGuid to create random part of file name (which in theory can have collisions also but in most practical cases should be fine) or Path.GetTempFileName:

var rs = Guid.NewGuid().ToString("N");
tempDocPath = tempDocPath + rs + ".docx";
tempPdfPath = tempPdfPath + rs + ".pdf";
Guru Stron
  • 102,774
  • 10
  • 95
  • 132