How to add a new column with day of week based on another in dataframe?

Question

I have a field in a data frame currently formatted as a string (mm/dd/yyyy) and I want to create a new column in that data frame with the day of week name (i.e. Thursday) for that field. I've imported

import com.github.nscala_time.time.Imports._

but am not sure where to go from here.

zero323 · Accepted Answer · 2015-09-24T15:22:52.600

13

Create formatter:

val fmt = DateTimeFormat.forPattern("MM/dd/yyyy")

Parse date:

val dt = fmt.parseDateTime("09/11/2015")

Get a day of the week:

dt.toString("EEEEE")

Wrap it using org.apache.spark.sql.functions.udf and you have a complete solution. Still there is no need for that since HiveContext already provides all the required UDFs:

val df = sc.parallelize(Seq(
   Tuple1("08/11/2015"), Tuple1("09/11/2015"), Tuple1("09/12/2015")
)).toDF("date_string")

df.registerTempTable("df")

sqlContext.sql(
  """SELECT date_string,
        from_unixtime(unix_timestamp(date_string,'MM/dd/yyyy'), 'EEEEE') AS dow
      FROM df"""
).show

// +-----------+--------+
// |date_string|     dow|
// +-----------+--------+
// | 08/11/2015| Tuesday|
// | 09/11/2015|  Friday|
// | 09/12/2015|Saturday|
// +-----------+--------+

EDIT:

Since Spark 1.5 you can use from_unixtime, unix_timestamp functions directly:

import org.apache.spark.sql.functions.{from_unixtime, unix_timestamp}

df.select(from_unixtime(
  unix_timestamp($"date_string", "MM/dd/yyyy"), "EEEEE").alias("dow"))

edited Sep 24 '15 at 15:22

answered Sep 11 '15 at 02:44

zero323

322,348
103
959
935

where does the 'EEEEE' format come from? I see the joda-time lib has E for day of week but I'm trying todo day of week as a number. I'm also trying to figure out why there are 5 E's in that format? – jspooner Jun 21 '16 at 04:47
@jspooner Use `u`. And the source is [SimpleDateFormat](https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html) – zero323 Jun 21 '16 at 10:12
Everytime I have to search for an equivalent in Java . Why did they have to make three options for languages. It just adds to complications with the answers found on the internet. – Akshay Hazari Oct 01 '16 at 16:43
@AkshayHazari Good one but to be honest if you exclude some syntax details Java solution will be identical to Scala. – zero323 Oct 01 '16 at 16:47

How to add a new column with day of week based on another in dataframe?

1 Answers1