I have these huge parquet files, stored in a blob, with more than 600k rows and I'd like to retrieve the first 100 so I can send them to my client app. This is the code I use now for this functionality:
private async Task < Table > getParquetAsTable(BlobClient blob) {
var table = new Table();
using(var stream = await blob.OpenReadAsync()) {
using(var memory = new MemoryStream()) {
await stream.CopyToAsync(memory);
var parquetReader = new ParquetReader(memory);
table = parquetReader.ReadAsTable();
}
}
var first100 = table.Take(100);
}
However, this process is kind of slow. await stream.CopyToAsync(memory);
takes 20 seconds and table = parquetReader.ReadAsTable();
takes 15 more so totally I have to wait 35 seconds.
Is there a way to limit this stream and get the first 100 rows at once, without having to download all of the rows, format them with ReadAsTable
and then take the first 100 only?