I have multiple small files in parquet format in a given HDFS location (the count is incremental for a given month as we receive two or more files per day for a given month). When I try to read the files from the HDFS location in SPARK 2.1 the time taken to read these files is more and is incremental when more small files are added to the given location.
Since the files are small I do not want to partition any further in HDFS.
Partitions are created by creating directories on HDFS and then the files are placed in those directories. File format is Parquet.
Is there any other format or process to read all the small files at once so that I can reduce the small files reading time.
Note: 1) Trying to create a program which can merge all the small files to one single file will add additional processing over head to my over all SLA to complete my process so I would keep this as my last option.