Create formatter:
val fmt = DateTimeFormat.forPattern("MM/dd/yyyy")
Parse date:
val dt = fmt.parseDateTime("09/11/2015")
Get a day of the week:
dt.toString("EEEEE")
Wrap it using org.apache.spark.sql.functions.udf
and you have a complete solution. Still there is no need for that since HiveContext
already provides all the required UDFs:
val df = sc.parallelize(Seq(
Tuple1("08/11/2015"), Tuple1("09/11/2015"), Tuple1("09/12/2015")
)).toDF("date_string")
df.registerTempTable("df")
sqlContext.sql(
"""SELECT date_string,
from_unixtime(unix_timestamp(date_string,'MM/dd/yyyy'), 'EEEEE') AS dow
FROM df"""
).show
// +-----------+--------+
// |date_string| dow|
// +-----------+--------+
// | 08/11/2015| Tuesday|
// | 09/11/2015| Friday|
// | 09/12/2015|Saturday|
// +-----------+--------+
EDIT:
Since Spark 1.5 you can use from_unixtime
, unix_timestamp
functions directly:
import org.apache.spark.sql.functions.{from_unixtime, unix_timestamp}
df.select(from_unixtime(
unix_timestamp($"date_string", "MM/dd/yyyy"), "EEEEE").alias("dow"))