Here's what my dataframe looks like:
p3.show(false)
CLASS_NAME ID CREATED_BY
/SC/ABC/123/abc 123 david
/SC/DEF/456/ghi 456 hannah
... more rows to follow
what I'd like to do is: split CLASS_NAME
column and concatenate the first two string parts of it and form a new column: CLIENT_ID
and append to the dataframe, desired output is below:
CLASS_NAME ID CREATED_BY CLIENT_ID
/SC/ABC/123/abc xyz david /SC/ABC
/SC/DEF/456/ghi jfk hannah /SC/DEF
... more rows to follow
So far, I was able to follow this answer and split them using below command:
import org.apache.spark.sql.functions.split
import spark.implicits._
val p4 = p3.withColumn("CLIENT_ID", split($"CLASS_NAME", "\\/")).select(
$"CLIENT_ID".getItem(1).as("col1"),
$"CLIENT_ID".getItem(2).as("col2"),
$"CLIENT_ID".getItem(3).as("col3")
)
p4.show(false)
col1 col2 col3
SC ABC 123
SC DEF 456
... more rows to follow
But I haven't figured out how to 1. concatenate the two strings; 2. append this new column to the original dataframe.
Any ideas would be greatly appreciated!