2

I cannot find any datetime formatting pattern to get the week of month in spark 3.0+

As use of 'W' is deprecated, is there a solution to get week of month without using legacy option?

The below code doesn't work for spark 3.2.1

df = df.withColumn("weekofmonth", f.date_format(f.col("Date"), "W"))
ZygD
  • 22,092
  • 39
  • 79
  • 102
Kavishka Gamage
  • 102
  • 2
  • 10

3 Answers3

3

For completeness, it's worth mentioning that one can set the configuration to "LEGACY".

spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
from pyspark.sql import functions as F

spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")

df = spark.createDataFrame(
    [('2022-07-01',),
     ('2022-07-02',),
     ('2022-07-03',)],
    ['Date'])
df.withColumn("weekofmonth", F.date_format(F.col("Date"), "W")).show()
# +----------+-----------+
# |      Date|weekofmonth|
# +----------+-----------+
# |2022-07-01|          1|
# |2022-07-02|          1|
# |2022-07-03|          2|
# +----------+-----------+
ZygD
  • 22,092
  • 39
  • 79
  • 102
1

you can try using udf:

from pyspark.sql.functions import col,year,month,dayofmonth

df = spark.createDataFrame(
    [(1, "2022-04-22"), (2, "2022-05-12")], ("id", "date"))

from calendar import monthcalendar
def get_week_of_month(year, month, day):
    return next(
        (
            week_number
            for week_number, days_of_week in enumerate(monthcalendar(year, month), start=1)
            if day in days_of_week
        ),
        None,
    )
fn1 = udf(get_week_of_month)
df =df.withColumn('week_of_mon',fn1(year(col('date')),month(col('date')),dayofmonth(col('date'))))
display(df)

enter image description here

Sudhin
  • 139
  • 7
  • This function gives incorrect values for a month like May (2022-05-12). Because 1st May consider as 0th week and it's the 6th day. So this function needs few modifications. – Kavishka Gamage May 23 '22 at 10:33
  • This works fine since 1st may is considered as 1st week, because your week starts from monday to sunday. So 1st may (sunday) falls in first week of the month. Try to check monthcalender() method. – Sudhin May 23 '22 at 15:17
  • To do the validation, I have used a calendar that goes from Sunday to Saturday. Sorry for the confusion and this method works fine. – Kavishka Gamage Jun 17 '22 at 12:33
1

If you have table with year, month and week numbers sorted by year and week, you may try my solution:

select 
      year_iso,
      month,
      posexplode(collect_list(week_iso)) as (week_of_month, week_iso)
from your_table_with_dates
group by year_iso, month

Here we just transform column week_iso into array grouped by year_iso and month, and then explodes it backward into two columns (position inside month and week_iso).

Note, positions starts in 0, but its not a real problem.

Screenshots of tables:

serge
  • 11
  • 1