0

I have a Dataset<Row> in spark just like:

+----+-------+
| age|   name|
+----+-------+
|  15|Michael|
|  30|   Andy|
|  19| Justin|
+----+-------+

Now I want to add a column that has value of string value of age plus string value of name,like:

+----+-------+-----------+
| age|   name|cbdkey     |
+----+-------+-----------+
|  15|Michael|  15Michael|
|  30|   Andy|  30Andy   |
|  19| Justin|  19Justin |
+----+-------+-----------+

I use:

df.withColumn("cbdkey",col("age").+(col("name"))).show()

But all value of new column cbdkey is null. So,How should I do this?Thanks in advance.

zpwpal
  • 183
  • 5
  • 12

2 Answers2

2

You can use the concat function:

df.withColumn("cbdkey", concat(col("age"), col("name"))).show
+---+-------+---------+
|age|   name|   cbdkey|
+---+-------+---------+
| 15|Michael|15Michael|
| 30|   Andy|   30Andy|
| 19| Justin| 19Justin|
+---+-------+---------+

If you need to specify a custom separator, use concat_ws:

df.withColumn("cbdkey", concat_ws(",", col("age"), col("name"))).show
+---+-------+----------+
|age|   name|    cbdkey|
+---+-------+----------+
| 15|Michael|15,Michael|
| 30|   Andy|   30,Andy|
| 19| Justin| 19,Justin|
+---+-------+----------+
Psidom
  • 209,562
  • 33
  • 339
  • 356
2

Other way is to write a UDF (User Defined Function) call this on the dataframe

val concatUDF = udf {
  (age: Int, name: String) => {
    age + name
  }
}

df.withColumn("cbdkey", concatUDF(col("age"), col("name"))).show()

output:

+---+-------+---------+
|age|   name|   cbdkey|
+---+-------+---------+
| 15|Michael|15Michael|
| 30|   Andy|   30Andy|
| 19| Justin| 19Justin|
+---+-------+---------+
Prasad Khode
  • 6,602
  • 11
  • 44
  • 59