s3 select : How to get column names of parquet files?

Question

I am using s3 select to read first 10 rows of a large parquet file stored in S3 bucket. I am able to get the first 10 rows in csv format but it comes without any header. It contains only rows without any column names.

Is there any way to get headers of this parquet file as well just like we do or CSV files? In CSV files, we can set FileHeaderInfo parameter to IGNORE to fetch headers. Is there anyway to do the same for parquet files too?

If not, is there any other way to read first 10 rows of this parquet file such that I can get headers as well of the parquet data?

I am reading though S3 select using the below InputSerialization logic as described in docs:

        SelectObjectContentRequest request = new SelectObjectContentRequest();
        request.setBucketName(bucket);
        request.setKey(key);
        request.setExpression(query);
        request.setExpressionType(ExpressionType.SQL);

        InputSerialization inputSerialization = new InputSerialization();
        inputSerialization.setParquet(new ParquetInput());
        inputSerialization.setCompressionType(CompressionType.NONE);
        request.setInputSerialization(inputSerialization);

        OutputSerialization outputSerialization = new OutputSerialization();
        outputSerialization.setCsv(new CSVOutput());
        request.setOutputSerialization(outputSerialization);

        return request;

s3 select : How to get column names of parquet files?

0 Answers0