I've converted RDDs to Data Frame using case classes, but my current data has 700 columns. I've come across mentions of using structtypes but I can't find an example. Hope someone can share an example here. Thank you. Kevin
Asked
Active
Viewed 1,012 times
-2
-
It would help if you showed a reproducible example of what you want. – mtoto Aug 12 '16 at 10:52
-
As I understand you are using Scala right? – Thiago Baldim Aug 12 '16 at 11:00
-
5Possible duplicate of [How to convert rdd object to dataframe in spark](http://stackoverflow.com/questions/29383578/how-to-convert-rdd-object-to-dataframe-in-spark) – Alberto Bonsanto Aug 12 '16 at 11:03
-
You want to create a dynamic schema to your dataframe? – Thiago Baldim Aug 12 '16 at 17:46
1 Answers
-2
Here is an example with sample input using structType :
a,1,2.0
b,2,3.0
import org.apache.spark.sql.Row
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.types.StructField
import org.apache.spark.sql.types.StringType
import org.apache.spark.sql.types.IntegerType
import org.apache.spark.sql.types.DoubleType
def getSchema(): StructType = {
val schema = new StructType(Array(
StructField("col_a", StringType, nullable = true),
StructField("col_b", IntegerType, nullable = true),
StructField("col_c", DoubleType, nullable = true)
))
schema
}
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val rdd = sc.textFile("/tmp/test").map(m => m.split(",", -1)).map(m => Row(m(0),m(1).toInt,m(2).toDouble))
val df = sqlContext.createDataFrame(rdd, getSchema)
df.show
+-----+-----+-----+
|col_a|col_b|col_c|
+-----+-----+-----+
| a| 1| 2.0|
| b| 2| 3.0|
+-----+-----+-----+

m-bhole
- 1,189
- 10
- 21
-
This is exactly what op has asked.... not sure why people are downvoting. Please leave the comment/reason for downvoting. – m-bhole Aug 13 '16 at 03:43
-
Thank you for your answer hadooper. I don't know why people down voted. There are many answers using case class, but that is limited to ~20 columns. I couldn't find a clear example on using the struct, so thank you so much for taking your time to answer my question. - kevin – kevin0259 Aug 13 '16 at 21:07
-
case classes are limited to 22 columns if you are using scala 2.10. This limitation has been removed from scala 2.11. So if you want to use case classes, you will have to use scala 2.11 – m-bhole Aug 14 '16 at 04:50