I am trying to perform some transformations on some CSV files in Azure Data Lake Storage Gen2. As a first step, I am downloading the files from the data lake using a DataLakeDirectoryClient
object from the Azure.Storage.Files.DataLake NuGet package.
I have a list of the file names which I'm looping over; creating a DataLakeFileClient
object for each.
There are 300 files in the folder. Each is roughly 190 kb in size.
The problem is that my code stops running after downloading 50 files. No exception is thrown, so I can't identify the issue. I'm testing this in a console application, and the console remains open until I manually stop the program.
directoryClient
is a DataLakeDirectoryClient
object.
fileNames
is a list of strings e.g. "file1.csv", "file2.csv", ..., "file300.csv".
var fileClients = new List<Stream>();
foreach (var fileName in fileNames)
{
var fileClient = directoryClient.GetFileClient(fileName);
var fileDownloadResponse = await fileClient.ReadAsync(); // Code hangs here on 51st file
fileClients.Add(fileDownloadResponse.Value.Content);
Console.WriteLine($"{fileName} downloaded.");
}
On the console, I see that the first 50 files are downloaded. For the 51st file, the fileClient
is made successfully, but the ReadAsync
method never returns a response.
I have separately tried to download just the 51st file, and that works with no issues. So it looks like this is nothing to do with that file in particular.
Since I could not find an explanation for this, I refactored my code. Rather than trying to download all 300 files at the start: I now just download one file at a time, perform the transformations I need, and then move to the next file. This is working for me, and all 300 files have been transformed successfully.
However, I still wanted to post this question because I would like to know exactly what went wrong with this original attempt, just to improve my own understanding of C#.