I have a Dataframe which has one column and value is concatenated with some delimiter, Now I Want to divide into multiple columns which can be close to up to 1000-2000 columns and number of records can be 60 millions or so. I am trying to find the best approach to do that , so the performance cannot be impacted.
I have the below approach but , can anyone please suggest if there is any better way to achieve this ?
val df = Seq(("1|2|3|4|5|6|7|8|9")).toDF("data")
val df2 = df1.withColumn("_tmp", split(col("data"), "\\|"))
df2.select( $"_tmp".getItem(0).as("col1"),
$"_tmp".getItem(1).as("col2"),
$"_tmp".getItem(2).as("col3"),
$"_tmp".getItem(3).as("col4")).drop("_tmp")
Thanks a lot in advance.