I have a spark job that reads the data from the hive table.
Ex:
r = spark.sql("select * from table")
and I have to write the result to hdfs location with 256mb parquet files.
I am trying
r.write.parquet("/data_dev/work/experian/test11")
This generates 30MB files But I need it to generate 256MB files
I also tried these configurations
r.write.option("parquet.block.size", 256 * 1024 * 1024 ). \
parquet("/path")
Still, the generated files seem to be ~30MB files