-2

I have a spark dataframe of the below format:

     +--------------------+
     |value               |
     +--------------------+
     |Id,date             |
     |000027,2017-11-14   |
     |000045,2017-11-15   |
     |000056,2018-09-09   |
     |C000056,2018-07-01  |
     +--------------------+

I need to loop through each row, split it by comma (,) and then place the values in different columns (Id and date as two separate columns).

I am new to spark, not sure whether it could be done through lambda function. Any suggestions would be appreciated.

pault
  • 41,343
  • 15
  • 107
  • 149
user3447653
  • 3,968
  • 12
  • 58
  • 100
  • How did your data get into this format (with the headers as a row)? How are you creating this DataFrame? – pault Aug 10 '18 at 18:26
  • 2
    Possible duplicate of [Split Spark Dataframe string column into multiple columns](https://stackoverflow.com/questions/39235704/split-spark-dataframe-string-column-into-multiple-columns) – pault Aug 10 '18 at 18:26

1 Answers1

-2
import org.apache.spark.sql.Row
import org.apache.spark.sql.types._
 import org.apache.spark.sql.SparkSession

val spark=SparkSession.builder().appName("Demo").getOrCreate()
var df=Seq("a,b,c,f","d,f,g,h").toDF("value")
df.show  //show the dataFrame 
+-------+
|  value|
+-------+
|a,b,c,f|
|d,f,g,h|
+-------+

  //splitting out the dataFrame with "," delimeter and creating rdd[Row]
   var rdd=df.rdd.map(x=>Row(x.getString(0).split(","):_*))
 var schema= StructType(Array("name","class","rank","grade").map(x=>StructField(x,StringType,true)))
spark.createDataFrame(rdd,schema).show
 +----+-----+----+-----+
 |name|class|rank|grade|
 +----+-----+----+-----+
 |   a|    b|   c|    f|
 |   d|    f|   g|    h|
 +----+-----+----+-----+
Gagan Sp
  • 17
  • 3
  • This a code only answer to a duplicate, and the code isn't even in the language requested by the OP. – pault Aug 10 '18 at 18:58