Convert String of YYYYMMDD to Date of DD.MM.YYYY in pySpark

Question

I have a requirement in deriving a value of date and calculate the End Of Month difference. From source, i have a string column "a" of value YYYYMMDD. But, in target the column has to map based on below condition:

EOM(to_date(a, "DD.MM.YYYY")) >= EOM(current_date)   #Output dateType is date type

I'm able to create the current_date value by importing datetime in python. But unable to convert the first part of code. Could you able to help in following code on how to acheive.

Not yet, getting problem in converting dd.mm.yyyy format as this string is not able to convert to date type using to_date function in pyspark. Please provide your idea. Thank you — Rocky1989, Dec 29 '19 at 11:03

score 0 · Answer 1 · answered Dec 29 '19 at 06:40

0

This is more a python question than a Pyspark one.

In Python you can do:

from datetime import datetime
strip_time = datetime.strptime("01.01.2020", '%d.%m.%Y').timetuple()
print strip_time

and get:

time.struct_time(tm_year=2020, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=1, tm_isdst=-1)

You can also do:

from time import time
print strip_time > time()

and get

True

So in Pyspark, with a RDD, you can first map you values using "datetime.strptime()" then do a filter and compare to "time()"

If you are working with DataFrames you can look here: Convert pyspark string to date format

answered Dec 29 '19 at 06:40

user3689574

1,596
1
11
20

Actually the problem is: For example: **now = datetime.datetime.strptime("01.01.2019", '%d.%m.%Y').timetuple() or now = datetime.datetime.strptime("01.01.2019", '%d.%m.%Y').time() or O/p: will be in following format as allways - 2020-01-01** But, i need it in 01.01.2020 – Rocky1989 Dec 29 '19 at 09:38
You need a string? Not a date? Then why not just do a split on the original string and join it in the way you want? – user3689574 Dec 29 '19 at 10:45

Convert String of YYYYMMDD to Date of DD.MM.YYYY in pySpark

1 Answers1