I'm using SPARK to read files in hdfs. There is a scenario, where we are getting files as chunks from legacy system in csv format.
ID1_FILENAMEA_1.csv
ID1_FILENAMEA_2.csv
ID1_FILENAMEA_3.csv
ID1_FILENAMEA_4.csv
ID2_FILENAMEA_1.csv
ID2_FILENAMEA_2.csv
ID2_FILENAMEA_3.csv
This files are loaded to FILENAMEA in HIVE using HiveWareHouse Connector, with few transformation like adding default values. Similarly we have around 70 tables. Hive tables are created in ORC format. Tables are partitioned on ID. Right now, I'm processing all these files one by one. It's taking much time.
I want to make this process much faster. Files will be in GBs.
Is there is any way to read all the FILENAMEA files at the same time and load it to HIVE tables.