I am using s3 select
to read first 10 rows of a large parquet file stored in S3 bucket. I am able to get the first 10 rows in csv
format but it comes without any header. It contains only rows without any column names.
Is there any way to get headers of this parquet file as well just like we do or CSV files? In CSV files, we can set FileHeaderInfo
parameter to IGNORE
to fetch headers. Is there anyway to do the same for parquet files too?
If not, is there any other way to read first 10 rows of this parquet file such that I can get headers as well of the parquet data?
I am reading though S3 select using the below InputSerialization
logic as described in docs:
SelectObjectContentRequest request = new SelectObjectContentRequest();
request.setBucketName(bucket);
request.setKey(key);
request.setExpression(query);
request.setExpressionType(ExpressionType.SQL);
InputSerialization inputSerialization = new InputSerialization();
inputSerialization.setParquet(new ParquetInput());
inputSerialization.setCompressionType(CompressionType.NONE);
request.setInputSerialization(inputSerialization);
OutputSerialization outputSerialization = new OutputSerialization();
outputSerialization.setCsv(new CSVOutput());
request.setOutputSerialization(outputSerialization);
return request;