Pyspark- Read specific partitions by range

Asked Jul 29 '19 at 14:52

Active Jul 29 '19 at 14:52

Viewed 1,689 times

I need to read in a specific partition range using pyspark. I have seen various posts such as as this, that when using scala you can do the following:

val dataframe = sqlContext
  .read
  .parquet("file:///your/path/data=jDD/year=2015/month=10/day={5,6}/*")

val dataframe = sqlContext
  .read
  .parquet("file:///your/path/data=jDD/year=2015/month=10/day=[5-10]/*")

When using pyspark, the first method using {} brackets works, this reads in specific partitions. However, I can't get the range method using [] to work.

I'm wondering is the syntax different for pyspark or is it just not supported?

asked Jul 29 '19 at 14:52

Auren Ferguson

Could your share which range you are trying exactly. The syntax is correct and it is supported by pyspark as it just passes it to JVM as is. – D3V Jul 29 '19 at 23:26
https://stackoverflow.com/a/24036343/5986661 does this: `sc.textFile("/my/dir1,/my/paths/part-00[0-5]*,/another/dir,/a/specific/file")` – Omkar Neogi Jan 27 '20 at 22:11

Pyspark- Read specific partitions by range

0 Answers0