-1

i have record as string with 1000 fields with delimiter as comma in dataframe like

"a,b,c,d,e.......upto 1000" -1st record "p,q,r,s,t ......upto 1000" - 2nd record

I am using below suggested solution from stackoverflow

Split 1 column into 3 columns in spark scala

df.withColumn("_tmp", split($"columnToSplit", "\\.")).select($"_tmp".getItem(0).as("col1"),$"_tmp".getItem(1).as("col2"),$"_tmp".getItem(2).as("col3")).drop("_tmp")

however in my case i am having 1000 columns which i have in JSON schema which i can retrive like

column_seq:Seq[Array]=Schema_func.map(_.name)
for(i <-o to column_seq.length-1){println(i+" " + column_seq(i))}

which returns like

0 col1 1 col2 2 col3 3 col4

Now I need to pass all this indexes and column names to below function of DataFrame

df.withColumn("_tmp", split($"columnToSplit", "\\.")).select($"_tmp".getItem(0).as("col1"),$"_tmp".getItem(1).as("col2"),$"_tmp".getItem(2).as("col3")).drop("_tmp")

in

$"_tmp".getItem(0).as("col1"),$"_tmp".getItem(1).as("col2"),

as i cant create the long statement with all 1000 columns , is there any effective way to pass all this arguments from above mentioned json schema to select function , so that i can split the columns , add the header and then covert the DF to parquet.

katty
  • 167
  • 1
  • 2
  • 11

1 Answers1

0

You can build a series of org.apache.spark.sql.Column, where each one is the result of selecting the right item and has the right name, and then select these columns:

val columns: Seq[Column] = Schema_func.map(_.name)
  .zipWithIndex // attach index to names
  .map { case (name, index) => $"_tmp".getItem(index) as name }

val result = df
  .withColumn("_tmp", split($"columnToSplit", "\\."))
  .select(columns: _*)

For example, for this input:

case class A(name: String)
val Schema_func = Seq(A("c1"), A("c2"), A("c3"), A("c4"), A("c5"))
val df = Seq("a.b.c.d.e").toDF("columnToSplit")

The result would be:

// +---+---+---+---+---+
// | c1| c2| c3| c4| c5|
// +---+---+---+---+---+
// |  a|  b|  c|  d|  e|
// +---+---+---+---+---+
Tzach Zohar
  • 37,442
  • 3
  • 79
  • 85