1

I have a spark job that reads the data from the hive table.
Ex:

r = spark.sql("select * from table")

and I have to write the result to hdfs location with 256mb parquet files.

I am trying

r.write.parquet("/data_dev/work/experian/test11")

This generates 30MB files But I need it to generate 256MB files

I also tried these configurations

r.write.option("parquet.block.size", 256 * 1024 * 1024 ). \
               parquet("/path")

Still, the generated files seem to be ~30MB files

user3190018
  • 890
  • 13
  • 26
Santosh Santu
  • 91
  • 1
  • 11

1 Answers1

0

I don't think there is any direct way possible to control the size in Spark. Please refer this link:

How do you control the size of the output file?

Ashish
  • 5,723
  • 2
  • 24
  • 25