0

I have a huge dataset and for illustration, I have simplified the dataset given below:

date    product price   amount
201901  A   10  20
201902  A   10  20
201903  A   20  30
201904  C   40  50

In this dataset date column shows year-weekNumber and I am trying to convert string type to date type in pyspark dataframe. Is there any efficient way to set date column as "date" type?

mck
  • 40,932
  • 13
  • 35
  • 50
user3104352
  • 1,100
  • 1
  • 16
  • 34
  • Does this work? `df['parsed_date'] = df['date'].apply(lambda s: datetime.strptime(s, '%Y%U'))` By the way, https://strftime.org/ is a memorable webpage to look through the `strftime` directives. – Ankur Jan 21 '21 at 04:24
  • Actually, week number is more complicated as mentioned here: https://stackoverflow.com/questions/17087314/get-date-from-week-number. I think parsing it correctly would require knowing exactly what the week number represents in the input data. In any case, a week number is not enough, we will need a day of the week as well. – Ankur Jan 21 '21 at 04:38

1 Answers1

2

You can use to_date with a suitable datetime pattern string:

import pyspark.sql.functions as F

spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
df2 = df.withColumn('date', F.to_date('date', 'yyyyww'))
mck
  • 40,932
  • 13
  • 35
  • 50