How to specify Parquet Block Size and Page Size in PySpark? I have searched everywhere but cannot find any documentation for the function calls or the import libraries.
Asked
Active
Viewed 1,151 times
1 Answers
5
According to spark-user archives
sc.hadoopConfiguration.setInt("dfs.blocksize", some_value)
sc.hadoopConfiguration.setInt("parquet.block.size", some_value)
so in PySpark
sc._jsc.hadoopConfiguration().setInt("dfs.blocksize", some_value)
sc._jsc.hadoopConfiguration().setInt("parquet.block.size", some_value)