how to convert the date format 'YYYY-MM-DD' to ddMMyy in pyspark?

Question

I tried to convert the date format 2018-07-12 to ddMMyy using to_date but i get null after converting the dateformat

df = spark.createDataFrame([('2018-07-12',)], ['Date_col'])

df = df.withColumn('new_date',to_date('Date_col', 'ddMMyy'))

I need to use this logic to convert the dataframe column. I am new to the spark programming and tried lot of solutions but nothing helps.

I need to concat the ddMMyy from one column and hhss from other column

Any help please?

can you please specify the final output you are looking for? — Nadim Younes, Jan 14 '20 at 13:52
Possible duplicate of [Pyspark changing type of column from date to string](https://stackoverflow.com/questions/32977360/pyspark-changing-type-of-column-from-date-to-string) and [pyspark convert dataframe column from timestamp to string of “YYYY-MM-DD” format](https://stackoverflow.com/questions/48910511/pyspark-convert-dataframe-column-from-timestamp-to-string-of-yyyy-mm-dd-format). — pault, Jan 14 '20 at 14:53
Also based on the sentence *I need to concat the ddMMyy from one column and hhss from other column*, it seems that this could be an [XY Problem](http://www.xyproblem.info). There may be a better solution but it's hard to tell without seeing a [mcve]. — pault, Jan 14 '20 at 14:58

score 4 · Answer 1 · answered Jan 14 '20 at 14:49

4

You can use the built-in pyspark.sql.functions.date_format(date, format) function:

from pyspark.sql.functions import date_format

df = spark.createDataFrame([('2018-07-12',)], ['dt'])
df.select(date_format('dt', 'ddMMyy').alias('date')).collect()

Results:

Reference: https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=dateformat#pyspark.sql.functions.date_format

answered Jan 14 '20 at 14:49

Foxan Ng

6,883
4
34
41

This should be verified answer – Haha May 09 '22 at 12:02

score 3 · Accepted Answer · answered Jan 14 '20 at 14:30

3

First of all let's create DataFrame

df = spark.createDataFrame([('2018-07-12',)], ['Date_col'])
df.show()

+----------+
|  Date_col|
+----------+
|2018-07-12|
+----------+

Then we will define UDF function for that.

from datetime import datetime
import pyspark.sql.types as T
import pyspark.sql.functions as F


def user_defined_timestamp(date_col):
    _date = datetime.strptime(date_col, '%Y-%m-%d')
    return _date.strftime('%d%m%y')

user_defined_timestamp_udf = F.udf(user_defined_timestamp, T.StringType())

And at the end we will apply our functions on DateFrame in order to create column we want.

df = df.withColumn('new_date', user_defined_timestamp_udf('Date_col'))
df.show()

+----------+--------+
|  Date_col|new_date|
+----------+--------+
|2018-07-12|  120718|
+----------+--------+

answered Jan 14 '20 at 14:30

lukaszKielar

531
3
6

Thanks for this I will try it. Just a question.. why are we actually define UDF function for the conversion of date ? Do i need define the function whenever i convert the date? – user10813834 Jan 14 '20 at 14:42
You are able to create any function you want using UDF. I don't know if you can use custom timestamp while using `to_date` pyspark function. – lukaszKielar Jan 14 '20 at 14:53
2

[Do not use a `udf` for this](https://stackoverflow.com/questions/38296609/spark-functions-vs-udf-performance). You can use [`date_format`](http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.functions.date_format) as shown in [this post](https://stackoverflow.com/questions/32977360/pyspark-changing-type-of-column-from-date-to-string). – pault Jan 14 '20 at 14:55

how to convert the date format 'YYYY-MM-DD' to ddMMyy in pyspark?

2 Answers2