Append Column to Row in Spark

Question

I have a DataFrame which I want to extend by a new column. Creating a new DateFrame from Rows is explained here.

My current strategy is to construct new Rows with the RowFactory from the Rows that are passed into my map invoked by DataFrame.javaRDD().map(...) but I fear that this might create unnecessary costs.

So I wonder if instead of creating new Rows, can I just extend the existing Rows by appending the new field. The Row interface doesn't seem to allow that.

code of Row

I think you cant modify as it is immutable – Sachin Janani Jan 13 '16 at 10:06 — Sachin Janani, Jan 13 '16 at 10:06

score 3 · Accepted Answer · answered Jan 13 '16 at 11:45

3

As @Sachin Janani mentions in a comment you cannot modify a Row (it's immutable), but you can append a column to a DataFrame using the withColumn-function. The code below will for instance add a column with the length of the strings found in column "text":

val stringLength = udf[Int, String](s => s.length)
val df2 = df1.withColumn("text_length", stringLength(df1("text")))

Hope this helps.

answered Jan 13 '16 at 11:45

Glennie Helles Sindholt

12,816
5
44
50

Thanks, that's very close to what I want but my udf would be very complicated (counting certain word occurrences in another column's text). I'll definitely keep [withColumn](https://spark.apache.org/docs/1.4.0/api/java/org/apache/spark/sql/DataFrame.html#withColumn%28java.lang.String,%20org.apache.spark.sql.Column%29) in mind. This [post](http://stackoverflow.com/questions/29483498/append-a-column-to-data-frame-in-apache-spark-1-3) is related to this. – Jonathan Jan 13 '16 at 12:58

Append Column to Row in Spark

1 Answers1