I am using hive ,
I have 24 json files with total size of 300MB (in one folder), so I have created one external table(i.e table1) and I loaded the data(i.e 24 files ) Into external table.
When I am running select query on top of that external table(i.e table1), I observed 3 mappers and 1 reducer is running.
After that I have created one more external table(i.e table2).
I have compressed the my input files (folder which contains 24 files ).
Example : BZIP2
So it compress the data but 24 files created with extension “.BZiP2” (i.e..file1.bzp2,…..file24.bzp2).
After that , I have load the my compressed files into my external table .
Now, when I am running select query , it is taking 24 mappers and 1 reducer. And observed CPU time is taking more time when compared to uncompressed data(i.e files) .
How can I reduce number of mappers, if data is in compressed format(i.e table2 select query )?
How can I reduce CPU time , if data is in compressed format(i.e table2 select query )? How CPU time will affect performance?