-1

This is how my pipelined RDD looks:

[([3.0, 12.0, 8.0, 49.0, 27.0], 7968.0),
 ([165.0, 140.0, 348.0, 615.0, 311.0], 165.0)]

I want to convert this to a dataframe. I have tried converting the first element (in square brackets) to an RDD and the second one to an RDD and then convert them individually to dataframes. I have also tried setting a schema and converting it but it has not worked. Can anybody help?

Thanks!

pault
  • 41,343
  • 15
  • 107
  • 149
  • Have you tried `myrdd.toDF()`? You can also specify column names: `myrdd.toDF(["col1", "col2"])` – pault May 02 '18 at 14:45

1 Answers1

0

You need to flatten your RDD before converting to a DataFrame:

df=rdd.map(lambda (x,y): x+[y]).toDF()

You can specify the schema argument of toDF() to get meaningful column names and/or types.

ags29
  • 2,621
  • 1
  • 8
  • 14
  • This is not true. You do not have the flatten the rdd first. You can call `toDF()` directly. – pault May 02 '18 at 14:45