0

Is it possible to read certain partitions from a folder using spark?

I only know this way: df = spark.read.parquet("/mnt/Staging/file_Name/")

Is there any way to read only those partitions where the date is not less than today minus 3 months?

1 Answers1

0

if your dataframe is partitioned by date, you can just use filter, spark will read only partitions with this date

df = spark.read.parquet("/mnt/Staging/file_Name/").filter(col("your_date_col") === "2022-02-03")

Danil
  • 16
  • 1
  • do you know any way how I can overwrite certain partitions in a folder, and leave other partitions unchanged? –  Apr 24 '22 at 14:24
  • @Denis if you have spark 2.3+ than you can use dynamic partitioning [overwrite-specific-partitions-in-spark-dataframe-write-method](https://stackoverflow.com/questions/38487667/overwrite-specific-partitions-in-spark-dataframe-write-method) – Danil Apr 25 '22 at 19:42