I am working on the problem where I have a large number of small compressed text file. Each file size is approx 10-20kb and have TBs of data. I need to load these files into Hive. Later, Tableau will use HIVE tables for its report generation. I am using AWS.
What is the best way to load data into hive. My call is
- move compressed data into mappers
- Decompressed them using Map only job.
- Process those txt files.
- Create an Hive table
- Load data from mappers into hive table. (My concern lies on this step. As per my understanding, it is possible that data can be loaded into Hive tables using multiple mappers but not so sure)
- use hive tables in the reporting tool.
Please suggest, is there any better way to handle this scenario.
Thanks