2

In EMR Spark, I have a HadoopRDD

org.apache.spark.rdd.RDD[(org.apache.hadoop.io.Text, org.apache.hadoop.dynamodb.DynamoDBItemWritable)] = HadoopRDD[0] at hadoopRDD

I want to convert this to DataFrame org.apache.spark.sql.DataFrame.

Does anyone know how to do this?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Shweta
  • 137
  • 1
  • 11

1 Answers1

2

First convert it to simple types. Let's say your DynamoDBItemWritable has just one string column:

val simple: RDD[(String, String)] = rdd.map {
  case (text, dbwritable) => (text.toString, dbwritable.getString(0))
}

Then you can use toDF to get a DataFrame:

import sqlContext.implicits._
val df: DataFrame = simple.toDF()
Daniel Darabos
  • 26,991
  • 10
  • 102
  • 114
  • Thanks, but when I run the first command to convert it to a simpler type it gives me following error: `scala> val simple: RDD[(String,String)] = orders.map { case (text, dbwritable) => (text.toString, dbwritable.getString(0))} :43: error: value getString is not a member of org.apache.hadoop.dynamodb.DynamoDBItemWritable val simple: RDD[(String,String)] = orders.map { case (text, dbwritable) => (text.toString, dbwritable.getString(0))}` ^ – Shweta Jul 20 '16 at 16:02
  • 1
    [It's there in the source code.](https://github.com/Willet/Hadoop-DynamoDB/blob/7ba3df83b452e92c745c6dbc9012b9a0876756cf/src/main/java/com/willetinc/hadoop/mapreduce/dynamodb/io/DynamoDBItemWritable.java#L122) Maybe it is a different version? Anyway, just check the methods of your `DynamoDBItemWritable` and find one to read primitive types out of it. – Daniel Darabos Jul 21 '16 at 18:20