0

I have a DataFrame which I want to extend by a new column. Creating a new DateFrame from Rows is explained here.

My current strategy is to construct new Rows with the RowFactory from the Rows that are passed into my map invoked by DataFrame.javaRDD().map(...) but I fear that this might create unnecessary costs.

So I wonder if instead of creating new Rows, can I just extend the existing Rows by appending the new field. The Row interface doesn't seem to allow that.

code of Row

Jonathan
  • 358
  • 3
  • 14

1 Answers1

3

As @Sachin Janani mentions in a comment you cannot modify a Row (it's immutable), but you can append a column to a DataFrame using the withColumn-function. The code below will for instance add a column with the length of the strings found in column "text":

val stringLength = udf[Int, String](s => s.length)
val df2 = df1.withColumn("text_length", stringLength(df1("text")))

Hope this helps.

Glennie Helles Sindholt
  • 12,816
  • 5
  • 44
  • 50
  • Thanks, that's very close to what I want but my udf would be very complicated (counting certain word occurrences in another column's text). I'll definitely keep [withColumn](https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/DataFrame.html#withColumn%28java.lang.String,%20org.apache.spark.sql.Column%29) in mind. This [post](http://stackoverflow.com/questions/29483498/append-a-column-to-data-frame-in-apache-spark-1-3) is related to this. – Jonathan Jan 13 '16 at 12:58