0

given a dataset with 2 columns:

| col1 | col2 | 
|   1  |  2   |
|   2  |  2   |
|   1  |  2   |
|   1  |  2   |

I would like to add a column with the sum of col1 and col2

| col1 | col2 | col3 |
|   1  |  2   |  3   |
|   2  |  2   |  4   |
|   1  |  2   |  3   |
|   1  |  2   |  3   |

I have found this question which basically seems to do exactly the same but in Scala.
Any tip?

JBoy
  • 5,398
  • 13
  • 61
  • 101

2 Answers2

0

Assuming your data is present in df, the desired output can be obtained by using either of the below mentioned ways,

  1. Using Dataframe operations
df.select("col1", "col2", (df3.col1 + df3.col2).alias("col3")).show()
  1. Using Spark SQL
df.createOrReplaceTempView("temp_data")
spark.sql("select *, (col1 + col2) as col3 from temp_data").show()

Output:

+----+----+----+
|col1|col2|col3|
+----+----+----+
|   1|   2|   3|
|   2|   2|   4|
|   1|   2|   3|
|   1|   2|   3|
+----+----+----+
noufel13
  • 653
  • 4
  • 4
0

Please find the below answer to create a new column in df.

val df1 = df.withColumn("new col", col("col1") + col("col2"))
df1.show
Ravi
  • 424
  • 3
  • 13