-1

I have a requirement in deriving a value of date and calculate the End Of Month difference. From source, i have a string column "a" of value YYYYMMDD. But, in target the column has to map based on below condition:

EOM(to_date(a, "DD.MM.YYYY")) >= EOM(current_date)   #Output dateType is date type

I'm able to create the current_date value by importing datetime in python. But unable to convert the first part of code. Could you able to help in following code on how to acheive.

Rocky1989
  • 369
  • 8
  • 28
  • Can you include the code for the part that you got working? – sshashank124 Dec 29 '19 at 06:17
  • Not yet, getting problem in converting dd.mm.yyyy format as this string is not able to convert to date type using to_date function in pyspark. Please provide your idea. Thank you – Rocky1989 Dec 29 '19 at 11:03

1 Answers1

0

This is more a python question than a Pyspark one.

In Python you can do:

from datetime import datetime
strip_time = datetime.strptime("01.01.2020", '%d.%m.%Y').timetuple()
print strip_time

and get:

time.struct_time(tm_year=2020, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=2, tm_yday=1, tm_isdst=-1)

You can also do:

from time import time
print strip_time > time()

and get

True

So in Pyspark, with a RDD, you can first map you values using "datetime.strptime()" then do a filter and compare to "time()"

If you are working with DataFrames you can look here: Convert pyspark string to date format

user3689574
  • 1,596
  • 1
  • 11
  • 20
  • Actually the problem is: For example: **now = datetime.datetime.strptime("01.01.2019", '%d.%m.%Y').timetuple() or now = datetime.datetime.strptime("01.01.2019", '%d.%m.%Y').time() or O/p: will be in following format as allways - 2020-01-01** But, i need it in 01.01.2020 – Rocky1989 Dec 29 '19 at 09:38
  • You need a string? Not a date? Then why not just do a split on the original string and join it in the way you want? – user3689574 Dec 29 '19 at 10:45