I have this code snippet:
private Table getParquetAsTable(BlobClient blob)
{
var stream = blob.OpenRead();
var parquetReader = new ParquetReader(stream);
return parquetReader.ReadAsTable();
}
whit this code does is it reads a parquet
file from Azure blob storage
. If my file has <= 10 columns, it gets returned fast however for bigger files I have to wait more than 40 seconds for it to get returned. While debugging, I noticed that the slow "thing" happens in my return parquetReader.ReadAsTable()
. I use the ParquetDotNet
library for reading a parquet
file. Is there a way to speed this up? Can I limit the stream, for the first 100 bytes for example, and have it returned faster? If so, how can I do this?