0

I have a parquet file which is partitioned by YEAR/MONTH/DAY.

From what i know, I can read it thay way for a specific date :

sqlContext
     .read
     .option("basePath", "file:///path/")
     .parquet("file:///path/YEAR=2015/MONTH=10/DAY=5/") 

But how can i get all partitions from a start date to an end date ?

Thanks,

Alex
  • 25
  • 8

1 Answers1

0

You have to read the whole file (file:///path) and then apply a .where() filter.

This will also "push down" the filter to the I/O-level and read only the partitions that are required.

Til Piffl
  • 548
  • 2
  • 12
  • Thanks for your help, appreciate it. But, if i want to compare with a date like "2015-01-01", i do need to concat all the columns and then parse it to date, no ? – Alex Jan 20 '22 at 16:05
  • 1
    Hm, yes. That's right, if you keep the partitioning as is, it will be hard to define an efficient date range filter. You could partition by a date-column tough. – Til Piffl Jan 20 '22 at 16:12