0

I need to read in specific parquet files with spark, I know this can be done like so:

sqlContext
    .read
    .parquet("s3://bucket/key", "s3://bucket/key")

Right now I have a List[String] object with all these s3 paths in it but I don't know how I can pass this programmatically to the parquet function in Scala? There are way to many files to do it manually, any ideas how to get the files into the parquet function programmatically?

moku
  • 4,099
  • 5
  • 30
  • 52

1 Answers1

3

I've answer a similar question earlier concerning repeated parameters here.

As @Dima mentioned, you are looking for a splat operator because .parquet expected repeated arguments :

sqlContext.read.parquet(listOfStrings:_*)

More on repeated arguments in the Scala Language Specification seciton 4.6.2

Although it's the specs of scala 2.9, this part didn't change.

Community
  • 1
  • 1
eliasah
  • 39,588
  • 11
  • 124
  • 154