In springboot application, I'm using hadoop to read parquet file from s3 amazon bucket. After getting the target file as inputstream, I want to read it. Here is my code
var s3="s3a://bucketX/file.parquet";
Path s3Path = new Path(s3);
Configuration configuration = new Configuration();
configuration.set("fs.s3a.aws.credentials.profileName", "profileX"); //profileX have the permission to read the file
configuration.set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem");
configuration.set("fs.s3a.endpoint", "s3-eu-west-3.amazonaws.com");
configuration.set("fs.s3a.aws.credentials.provider", "com.amazonaws.auth.profile.ProfileCredentialsProvider");
var s3fs=new S3AFileSystem();
s3fs.initialize(new URI(s3), configuration);
InputStream s3InputStream = s3fs.open(new Path(s3));
Here is my pom.xml configuration
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-aws</artifactId>
<version>3.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.parquet</groupId>
<artifactId>parquet-hadoop</artifactId>
<version>1.12.3</version>
</dependency>
ParquetFileReader expect a HadoopInputFile as an input. How can I convert the input stream to HaddopInputFile ?
ParquetFileReader reader = ParquetFileReader.open(convertToHadoopInputStream(s3InputStream))