I developed a dotnet-based HTTP Azure function, with a maximum memory limit of 1GB. However, my code is consuming more memory than the limit, which results in a Memory Set Limit being reached, causing the worker to stop, halting the HTTP request and leading to a 5XX error.
The function receives an ID as a parameter, generates a list of BuildArtifacts, which is about 140 in size, and each object in the list has a download Uri for a zip file containing a log file in .txt format. The function reads the contents of each zip file in read-only mode using ZipArchiveMode.Read and filters the stack traces within each .txt file, appending them to a string variable called responseString. The string is sent as a response.
The problem is that the function exceeds the 1GB memory limit, and limiting the read behavior to avoid exceeding the limit results in response times of over 10 minutes. The function stops executing after 10 minutes, as this is the maximum response time allowed by the Azure function. Additionally, I cannot create or modify files in the wwwroot directory or any other directory that is part of the package, and using Path.GetTempPath() is not an option as the temporary folder has storage size up to 100MB, which is insufficient for storing the content of the zip files.
Code:-
int maxParallelism = 15; // maximum number of tasks to run in parallel
var semaphore = new SemaphoreSlim(maxParallelism);
var buildArtifacts = await GetBuildArtifacts(buildId);
foreach (var artifact in buildArtifacts.value)
{
await semaphore.WaitAsync(); // wait for an available slot
Logger.Log(artifact.name);
tasks.Add(WriteExceptionsToFile(artifact, stackTraceKeywords, context, buildId, semaphore));
}
await Task.WhenAll(tasks).ConfigureAwait(false);
public static async Task WriteExceptionsToFile(Artifact artifact, List<List<string>> stackTraceKeywords, Microsoft.Azure.WebJobs.ExecutionContext context, int buildId, SemaphoreSlim semaphore)
{
try
{
var artifactName = artifact.name;
if (artifactName.Contains("Logs"))
{
var logsUri = artifact.resource.downloadUrl;
var logsFolder = $"{artifactName}";
var fileName = $"{artifactName}.zip";
await ReadFileContentFromZipArchive(logsUri, buildId, artifactName, stackTraceKeywords);
}
}
finally
{
semaphore.Release();
}
}
static async Task ReadFileContentFromZipArchive(string artifactLogsDownloadUri, int buildId, string artifactName, List<List<string>> stackTraceKeywords)
{
HttpResponseMessage response = await client.GetAsync(artifactLogsDownloadUri);
var stream = new BufferedStream(await response.Content.ReadAsStreamAsync(), 8 * 1024); // buffer size of 8KB
using (var archive = new ZipArchive(stream, ZipArchiveMode.Read))
{
foreach (ZipArchiveEntry entry in archive.Entries)
{
if (!string.IsNullOrEmpty(entry.Name))
{
using (var sr = new StreamReader(entry.Open(), Encoding.UTF8))
{
await ProcessLogFile(sr, artifactName, stackTraceKeywords);
}
}
}
}
}
static async Task ProcessLogFile(StreamReader sr, string artifactName, List<List<string>> stackTraceKeywords)
{
string keyword = "KEYWORD";
bool shouldProcess = false;
int lineCount = 0;
StringBuilder stackTrace = new StringBuilder();
while (!sr.EndOfStream)
{
string line = await sr.ReadLineAsync();
if (line.Contains(keyword))
{
shouldProcess = true;
//Logger.Log($"line:- {line}");
}
if (shouldProcess)
{
stackTrace.AppendLine(line);
lineCount = 0;
// want next 20 lines of stack trace
while ((line = await sr.ReadLineAsync()) != null)
{
if (lineCount > 20)
{
break;
}
stackTrace.AppendLine(line + "\n");
lineCount++;
}
bool[] results = stackTraceKeywords.Select(kw => ContainsAllKeywords(stackTrace.ToString(), kw)).ToArray();
if (results.Any(x => x))
{
continue; // as keywords are found, skip this stacktrace
}
//FormatTheStacktrace(stacktrace);
var exceptionInformation = $"Artifact Name:- {artifactName}\n\n{stackTrace.ToString()}\n==============================================\n";
//Logger.Log($"exceptionInformation:- {exceptionInformation}");
responseString += exceptionInformation;
shouldProcess = false;
stackTrace.Clear();
}
}
}
How can I optimize the code such that memory limit doesn't exceeds and also response time is less than 10min?
Any suggestion/advice for the approach?