1

I developed a dotnet-based HTTP Azure function, with a maximum memory limit of 1GB. However, my code is consuming more memory than the limit, which results in a Memory Set Limit being reached, causing the worker to stop, halting the HTTP request and leading to a 5XX error.

The function receives an ID as a parameter, generates a list of BuildArtifacts, which is about 140 in size, and each object in the list has a download Uri for a zip file containing a log file in .txt format. The function reads the contents of each zip file in read-only mode using ZipArchiveMode.Read and filters the stack traces within each .txt file, appending them to a string variable called responseString. The string is sent as a response.

The problem is that the function exceeds the 1GB memory limit, and limiting the read behavior to avoid exceeding the limit results in response times of over 10 minutes. The function stops executing after 10 minutes, as this is the maximum response time allowed by the Azure function. Additionally, I cannot create or modify files in the wwwroot directory or any other directory that is part of the package, and using Path.GetTempPath() is not an option as the temporary folder has storage size up to 100MB, which is insufficient for storing the content of the zip files.

Code:-

int maxParallelism = 15; // maximum number of tasks to run in parallel
var semaphore = new SemaphoreSlim(maxParallelism);
var buildArtifacts = await GetBuildArtifacts(buildId);

foreach (var artifact in buildArtifacts.value)
{
    await semaphore.WaitAsync(); // wait for an available slot
    Logger.Log(artifact.name);
    tasks.Add(WriteExceptionsToFile(artifact, stackTraceKeywords, context, buildId, semaphore));
}
await Task.WhenAll(tasks).ConfigureAwait(false);
public static async Task WriteExceptionsToFile(Artifact artifact, List<List<string>> stackTraceKeywords, Microsoft.Azure.WebJobs.ExecutionContext context, int buildId, SemaphoreSlim semaphore)
{
    try
    {
        var artifactName = artifact.name;

        if (artifactName.Contains("Logs"))
        {

            var logsUri = artifact.resource.downloadUrl;
            var logsFolder = $"{artifactName}";
            var fileName = $"{artifactName}.zip";
            await ReadFileContentFromZipArchive(logsUri, buildId, artifactName, stackTraceKeywords);
        }

    }
    finally
    {
        semaphore.Release();
    }

}
static async Task ReadFileContentFromZipArchive(string artifactLogsDownloadUri, int buildId, string artifactName, List<List<string>> stackTraceKeywords)
{

    HttpResponseMessage response = await client.GetAsync(artifactLogsDownloadUri);
    var stream = new BufferedStream(await response.Content.ReadAsStreamAsync(), 8 * 1024); // buffer size of 8KB

    using (var archive = new ZipArchive(stream, ZipArchiveMode.Read))
    {
        foreach (ZipArchiveEntry entry in archive.Entries)
        {
            if (!string.IsNullOrEmpty(entry.Name))
            {
                using (var sr = new StreamReader(entry.Open(), Encoding.UTF8))
                {
                    await ProcessLogFile(sr, artifactName, stackTraceKeywords);
                }
            }
        }
    }
}

static async Task ProcessLogFile(StreamReader sr, string artifactName, List<List<string>> stackTraceKeywords)
{
    string keyword = "KEYWORD";
    bool shouldProcess = false;

    int lineCount = 0;
    StringBuilder stackTrace = new StringBuilder();

    while (!sr.EndOfStream)
    {
        string line = await sr.ReadLineAsync();

        if (line.Contains(keyword))
        {
            shouldProcess = true;


            //Logger.Log($"line:- {line}");
        }

        if (shouldProcess)
        {
            stackTrace.AppendLine(line);
            lineCount = 0;
            // want next 20 lines of stack trace
            while ((line = await sr.ReadLineAsync()) != null)
            {
                if (lineCount > 20)
                {
                    break;
                }

                stackTrace.AppendLine(line + "\n");
                lineCount++;
            }




            bool[] results = stackTraceKeywords.Select(kw => ContainsAllKeywords(stackTrace.ToString(), kw)).ToArray();
            if (results.Any(x => x))
            {
                continue; // as keywords are found, skip this stacktrace
            }

            //FormatTheStacktrace(stacktrace);

            var exceptionInformation = $"Artifact Name:- {artifactName}\n\n{stackTrace.ToString()}\n==============================================\n";
            //Logger.Log($"exceptionInformation:- {exceptionInformation}");
            responseString += exceptionInformation;

            shouldProcess = false;
            stackTrace.Clear();

        }
    }
}

How can I optimize the code such that memory limit doesn't exceeds and also response time is less than 10min?

Any suggestion/advice for the approach?

Vansh
  • 43
  • 4
  • 1
    What database are you using? A database should be installed on a machine that is the fastest available with the most amount of memory possible. Your issue is not your code. It is the server that you installed the database on. Get a faster machine. – jdweng Mar 25 '23 at 10:22
  • @jdweng code doesn't utilize any database as of now. – Vansh Mar 25 '23 at 12:03
  • Isn't Azure a database application? – jdweng Mar 25 '23 at 12:38
  • @jdweng Azure is a cloud provider owned by Microsoft. They provide a lot of things: various database services, as well as Azure Functions which is serverless code execution. You can combine them, but this does not. – Charlieface Mar 26 '23 at 03:23
  • @Charlieface : A zip file is technically a database. The OP should be using a real database instead of a zip file. Not sure, but I think the unzip method is on the server (not client) and adding more memory on the server will solve issue. It is taking over 10 minutes to pull files from the zip. – jdweng Mar 26 '23 at 08:28
  • @jdweng That's a ridiculous thing to say. Zip files have many and varied uses for file storage, most of which are inappropriate in a database. Yes there is only a server involved, not a client, but you don't have much access to the server in this type of service. What OP needs here are optimizations that allow streaming the data through rather than buffering it in memory, which is eminently possible if you examine my answer below. – Charlieface Mar 26 '23 at 08:41
  • @Charlieface : Read the definition of Database. It is a file that stores data which is what a ZIP file. That is why they call it an ARCHIVE. The OP has a design issue. He created a database that is so large he cannot read the data because he doesn't have enough memory. – jdweng Mar 26 '23 at 08:55
  • @jdweng Best practice is definitely not in agreement there, especially if large files or compression is involved. https://softwareengineering.stackexchange.com/questions/365637/is-storing-files-of-up-to-50mb-in-size-in-a-database-for-use-by-multiple-servers https://softwareengineering.stackexchange.com/questions/150669/is-it-a-bad-practice-to-store-large-files-10-mb-in-a-database https://stackoverflow.com/questions/3748/storing-images-in-db-yea-or-nay#3756 https://stackoverflow.com/questions/38120895/database-vs-file-system-storage – Charlieface Mar 26 '23 at 11:09
  • @Charlieface : The issue here is memory. It requires a lot more memory to exact data than to add data. Adding data is a simple append operation with very little memory. Extracting you need much more memory. The OP created an archive and no doesn't have enough memory to extract the data. – jdweng Mar 26 '23 at 13:12
  • @jdweng It actually looks like it's a fairly straightforward way of processing zipped log files (which itself is a standard way of rolling over old log files). If you worked as a sysadmin on both Windows and Linux you would be pretty familiar with this. OP is simply downloading them from some API and processaing them. There is no "append" happening here at all, I think you are misunderstanding the use case. – Charlieface Mar 26 '23 at 13:20
  • @Charlieface : To create the archive is an append. The append worked. Now the extract is failing due to memory. Adding and removing from a ZIP makes file huge. There is no compression done on a ZIP. When an item is removed it is removed from the directory an leave large voids in the file. Most ZIP utilities will not try to overwrite the voids. The issue here is trying to extract from a ZIP when you do not have enough memory. – jdweng Mar 26 '23 at 13:47

1 Answers1

0

It's hard to say without having seen a full Memory Dump analysis with a tool such as ANTS Memory Profiler.

But from what I can see of the code, ReadFileContentFromZipArchive can be improved.

  • HttpResponseMessage response needs a using.
  • Instead of just using GetAsync, use SendAsync so that you can set HttpCompletionOption.ResponseHeadersRead. This allows it to buffer only the headers, not the content.
  • You are using a BufferedStream, I cannot say whether that is better or worse so you should try with and without (passing the response stream straight through to ZipArchive), but it should certainly have a using.
static async Task ReadFileContentFromZipArchive(string artifactLogsDownloadUri, int buildId, string artifactName, List<List<string>> stackTraceKeywords)
{
    using var request = new HttpRequestMessage(HttpMethod.Get, artifactLogsDownloadUri);
    using var response = await client.SendAsync(request, HttpCompletionOption.ResponseHeadersRead);
    using var stream = new BufferedStream(await response.Content.ReadAsStreamAsync(), 8 * 1024); // buffer size of 8KB

And ProcessLogFile is the likely cause of your data explosion:

  • Consider using a StringBuilderCache to prevent allocations. See this article as well as the internal source code.
  • Instead of stackTrace.AppendLine(line + "\n"); do stackTrace.AppendLine(line).AppendLine(); as this may prevent an extra allocation.
  • Move the check for linecount to the end of the loop to avoid looping a 21st time.
  • I'm not sure what ContainsAllKeywords does as you haven't show it, but you shouldn't ToString() a StringBuilder, nor should you ToArray() a LINQ query that you are just going to do Any() on.
  • Does it not make sense to clear the StringBuilder even if you are skipping the stack trace?
  • When creating exceptionInformation don't ToString() the StringBuilder, just pass it as is to the string interpolation.
  • Consider making responseString a StringBuilder also.
static async Task ProcessLogFile(StreamReader sr, string artifactName, List<List<string>> stackTraceKeywords)
{
    string keyword = "KEYWORD";
    bool shouldProcess = false;
    StringBuilder stackTrace = GetStringBuilderFromCache();

    while (!sr.EndOfStream)
    {
        string line = await sr.ReadLineAsync();

        if (line.Contains(keyword))
        {
            shouldProcess = true;
            //Logger.Log($"line:- {line}");
        }

        if (shouldProcess)
        {
            stackTrace.AppendLine(line);
            var lineCount = 0;
            // want next 20 lines of stack trace
            while ((line = await sr.ReadLineAsync()) != null)
            {
                stackTrace.AppendLine(line).AppendLine();
                lineCount++;
                if (lineCount > 20)
                    break;
            }

            if (stackTraceKeywords.Any(kw => ContainsAllKeywords(stackTrace, kw)))
            {
                continue; // as keywords are found, skip this stacktrace
            }

            //FormatTheStacktrace(stacktrace);

            var exceptionInformation = $"Artifact Name:- {artifactName}\n\n{stackTrace}\n==============================================\n";
            //Logger.Log($"exceptionInformation:- {exceptionInformation}");
            responseString += exceptionInformation;

            shouldProcess = false;
            stackTrace.Clear();
        }
    }
}

If that still doesn't help then consider using a non-allocating StreamReader, utilizing only Span<byte> arrays of your relevant keywords and the data.

Charlieface
  • 52,284
  • 6
  • 19
  • 43