0

I have a DataFrame containing three DataFrames of the same type (same parquet schema). They only differ in the content/values they are containing:

nested structure

I want to flatten the structure, so that the three DataFrames are getting merged into one single Parquet DataFrame containing all of the content/values.

I tried it with flatten and flatMap, but with that I always get the error:

Error: No implicit view available from org.apache.spark.sql.DataFrame => Traversable[U].parquetsFiles.flatten Error: not enough arguments for method flatten: (implicit as Trav: org.apache.spark.sql.DataFrame => Traversable[U], implicit m: scala.reflect.ClassTag[U]. Unspecified value parameters asTrav, m. parquetFiles.flatten

I also converted it to a List and then tried to flatten and this is also producing the same error. Do you have any idea how to solve it or what is the problem here? Thanks, Alex

AlexL
  • 761
  • 1
  • 6
  • 20

2 Answers2

3

The scala compiler is looking for a way to convert the DataFrames to a Traversable so it can apply the flatten. But a DataFrame is not Traversable, so it will fail. Also, no ClassTag available because DataFrames are not statically typed.

The code you're looking for is

parquetFiles.reduce(_ unionAll _)

which can be optimized by the DataFrame execution engine.

Reactormonk
  • 21,472
  • 14
  • 74
  • 123
  • Thanks a lot Reactormonk, thats working fine. Just one addition: For me the coding must be like this: `arquetFiles.reduce(_ unionAll(_))`. Thanks! – AlexL Oct 30 '15 at 12:02
2

So it seems like you want to join these three DataFrames together, to do that the unionAll function would work really well. You could do parquetFiles.reduce((x, y) => x.unionAll(y)) (note this will explode on an empty list but if you might have that just look at one of the folds instead of reduce).

Holden
  • 7,392
  • 1
  • 27
  • 33
  • Thanks a lot Holden! That is also working nicely and exactly what i looked for. Also like your work with spark-testing-base. Keep it up! :) Thanks – AlexL Oct 30 '15 at 12:06
  • Thanks so much for the kind words, I'm really glad spark-testing-base is working it out well for you :) (P.S. If you have any feature requests for it please create a github issue and I'll try and take look :)) – Holden Oct 30 '15 at 18:03