2

Suppose I have a Spark Scala DataFrame object like:

+----------------+ |col1 |col2 | +----------------+ |1 |2 | |3 |4 | +----------------+

And I want a DataFrame like: +--------------------+ |col1 |col2 |col3 | +--------------------+ |1 |2 |3 | |3 |4 |7 | +--------------------+

Which adds col1 and col2 to col3, could anyone please tell me how to do that? WithColumn takes only one column as parameter whereas I need to take two columns.

Yiming Sun
  • 81
  • 2
  • 3
  • Consider that `withColumn` is lazy and will be optimized at runtime, I'm fairly sure you can call it twice for the two columns without any problem. – stefanobaghino Jun 15 '18 at 06:17
  • 1
    Possible duplicate of [Adding a column of rowsums across a list of columns in Spark Dataframe](https://stackoverflow.com/questions/37624699/adding-a-column-of-rowsums-across-a-list-of-columns-in-spark-dataframe) – Ramesh Maharjan Jun 15 '18 at 09:58

2 Answers2

3

You can use withColumn or select as

val df = Seq(
  (1,2),
  (3,4)
).toDF("col1", "col2")

df.withColumn("col3", $"col1" + $"col2").show(false)

df.select($"col1", $"col2", ($"col1" + $"col2").as("col3")).show(false)

Output:

+----+----+----+
|col1|col2|col3|
+----+----+----+
|1   |2   |3   |
|3   |4   |7   |
+----+----+----+
koiralo
  • 22,594
  • 6
  • 51
  • 72
0

WithColumn takes two parameters a name and a function that should result in a type column - so a function or expression whose result is a column is valid hence you can do the below (or similar)

df.withColumn("col3", df("col1")+df("col2")) 
Arnon Rotem-Gal-Oz
  • 25,469
  • 3
  • 45
  • 68