3

How to specify Parquet Block Size and Page Size in PySpark? I have searched everywhere but cannot find any documentation for the function calls or the import libraries.

Utkarsh
  • 171
  • 1
  • 14

1 Answers1

5

According to spark-user archives

sc.hadoopConfiguration.setInt("dfs.blocksize", some_value)
sc.hadoopConfiguration.setInt("parquet.block.size", some_value)

so in PySpark

sc._jsc.hadoopConfiguration().setInt("dfs.blocksize", some_value)
sc._jsc.hadoopConfiguration().setInt("parquet.block.size", some_value)