6

In a DataFrame object in Apache Spark (I'm using the Scala interface), if I'm iterating over its Row objects, is there any way to extract structure values by name?

I am using the below code to extract by name but I am facing problem on how to read the struct value .

If values had been of type string then we could have done this:

 val resultDF=joinedDF.rdd.map{row=> 
      val id=row.getAs[Long]("id")
      val values=row.getAs[String]("slotSize")
      val feilds=row.getAs[String](values)
      (id,values,feilds)
      }.toDF("id","values","feilds")

But in my case values has the below schema

v1: struct (nullable = true)
     |    |-- level1: string (nullable = true)
     |    |-- level2: string (nullable = true)
     |    |-- level3: string (nullable = true)
     |    |-- level4: string (nullable = true)
     |    |-- level5: string (nullable = true)

What shall I replace this line with to make the code work given that value has the above structure.

  row.getAs[String](values)
satyambansal117
  • 193
  • 1
  • 3
  • 13
  • Why are you doing df => rdd => df? Looks like your transformation could be expressed with DataFrame operations and save you a lot of trouble in the process. – maasg Nov 10 '16 at 11:09
  • because I have to do some row-wise computations and I require this transformation for traversing dataframes row by row. – satyambansal117 Nov 10 '16 at 11:13
  • What kind of row-wise computations? – maasg Nov 10 '16 at 11:20
  • You can look into this http://stackoverflow.com/questions/40502085/transforming-two-dataframes-in-spark-sql/40514493?noredirect=1#comment68285387_40514493 – satyambansal117 Nov 10 '16 at 11:25
  • Doesn't look like you've done much effort between the previous question and this. The answer for this one lies here: https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/sql/Row.html – maasg Nov 10 '16 at 11:37

1 Answers1

23

You can access the struct elements my first extracting another Row (structs are modeled as another Row in spark) from the toplevel Row like this:

Scala Implementation

val level1 = row.getAs[Row]("struct").getAs[String]("level1")

Java Implementation

 String level1 = f.<Row>getAs("struct).getAs("level1").toString();  
Yashwanth Kambala
  • 412
  • 1
  • 5
  • 14
Raphael Roth
  • 26,751
  • 15
  • 88
  • 145