Split date into day of the week, month,year using Pyspark

Question

I have very little experience in Pyspark and I am trying with no success to create 3 new columns from a column that contain the timestamp of each row.

The column containing the date has the following format:EEE MMM dd HH:mm:ss Z yyyy. So it looks like this:

+--------------------+
|           timestamp|
+--------------------+
|Fri Oct 18 17:07:...|
|Mon Oct 21 21:49:...|
|Thu Oct 31 18:03:...|
|Sun Oct 20 15:00:...|
|Mon Sep 30 23:35:...|
+--------------------+

The 3 columns have to contain: the day of the week as an integer (so 0 for monday, 1 for tuesday...), the number of the month and the year. What is the most effective way to create these additional 3 columns and append them to the pyspark dataframe? Thanks in advance!!

please read as well: [how-to-make-good-reproducible-apache-spark-examples](https://stackoverflow.com/questions/48427185/how-to-make-good-reproducible-apache-spark-examples) — Hansanho, Jun 18 '21 at 17:33

score 3 · Accepted Answer · answered Jun 18 '21 at 13:23

Spark 1.5 and higher has many date processing functions. Here are some that maybe useful for you

from pyspark.sql.functions import *
from pyspark.sql.functions import year, month, dayofweek
df = df.withColumn('dayOfWeek', dayofweek(col('your_date_column')))
df = df.withColumn('month', month(col('your_date_column')))
df = df.withColumn('year', year(col('your_date_column')))

Split date into day of the week, month,year using Pyspark

1 Answers1