How to concatenate small parquet files in HIVE when below are in place.
- Partitions are created dynamically on HIVE table.
- Table is EXTERNAL.
Solution Tried so far but for ORC files which has bug : For ORC file I was using below command in loop for all partition values and it works fine. But data is lost after concatenation is done which is bug in HIVE https://issues.apache.org/jira/browse/HIVE-17280
I am running HIVE query on EMR cluster which is using hive 2.3.3 and this wasn’t fixed until hive 3.0.0
Command Used to achieve this but with ORC file. I need to do so for parquet files.
ALTER TABLE HIVE_DB.HIVE_TABLE_NM PARTITION(partition_field_nm ='${partition_value}') CONCATENATE;
this is used for ORC file.
Need to do similar concatenation of small parquet files into bigger files.