1

My dataset has columns named "key(string), value(long)"

The value of column key like prefix.20171012.111.2222, and the value of column value like 9999.

I want to transform the dataset to a new one which split the colmun key to others like ths "day, rt, item_id, value".

how to do it, thanks a lot

1 Answers1

0
// input ds looks like this
+--------+-----+
|     key|value|
+--------+-----+
|20171011| 9999|
+--------+-----+

//import the functions you need
import org.apache.spark.sql.functions.{to_date, month, year, dayofmonth}

// ds2 
val ds2 = ds.withColumn("date", to_date($"key", "yyyyMMdd"))

// ds2.show()
+--------+-----+----------+
|     key|value|      date|
+--------+-----+----------+
|20171011| 9999|2017-10-11|
+--------+-----+----------+

// ds3
val ds3 = ds2.withColumn("Month", month($"date"))
  .withColumn("Year", year($"date"))
  .withColumn("Date", dayofmonth($"date"))

// ds3.show()
+--------+-----+----+-----+----+
|     key|value|Date|Month|Year|
+--------+-----+----+-----+----+
|20171011| 9999|  11|   10|2017|
+--------+-----+----+-----+----+
L.Li
  • 523
  • 3
  • 10