1

How do I set the parquet file size? I've tried tweaking some settings, but ultimately I get a single large parquet file.

I've created a partitioned external table and then insert into it via an insert overwrite statement.

SET hive.auto.convert.join=false;
SET hive.support.concurrency=false;
SET hive.exec.reducers.max=600;
SET hive.exec.parallel=true;
SET hive.exec.compress.intermediate=true;
SET hive.intermediate.compression.codec=org.apache.hadoop.io.compress.Lz4Codec;
SET mapreduce.map.output.compress=false;
SET mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.Lz4Codec;
SET hive.groupby.orderby.position.alias=true;
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.optimize.sort.dynamic.partition=true;
SET hive.resultset.use.unique.column.names=false
SET mapred.reduce.tasks=100;
SET dfs.blocksize=268435456;
SET parquet.block.size=268435456;

INSERT OVERWRITE TABLE my_table PARTITION (dt)
SELECT dt, x, sum(y) FROM managed_table GROUP BY dt, x;

Using the dfs.blocksize and parquet.block.size parameters, I was hoping to generate 256 mb parquet file splits, but I'm getting a single 4 gb parquet file. Howe

Jammy
  • 413
  • 1
  • 6
  • 12
  • Possible duplicate of https://stackoverflow.com/questions/30848775/set-parquet-snappy-output-file-size-is-hive – sathya Jul 11 '20 at 06:59
  • Read this: https://stackoverflow.com/a/45350927/2700344 and this also: https://stackoverflow.com/a/55375261/2700344 – leftjoin Jul 11 '20 at 09:24
  • Does this answer your question? [Set parquet snappy output file size is hive?](https://stackoverflow.com/questions/30848775/set-parquet-snappy-output-file-size-is-hive) – sathya Jul 11 '20 at 20:14

0 Answers0