7

I need to fetch week start date and week end date from a given date, taking into account that the week starts from Sunday and ends on Saturday.

I referred this post but this takes monday as starting day of week. Is there any inbuilt function in spark which can take care of this?

ben
  • 1,404
  • 8
  • 25
  • 43

3 Answers3

12

Find out the day of the week and Use selectExpr to iterate through columns , and making Sunday as week start date

from pyspark.sql import functions as F


df_b = spark.createDataFrame([('1','2020-07-13')],[ "ID","date"])
df_b = df_b.withColumn('day_of_week', F.dayofweek(F.col('date')))
df_b = df_b.selectExpr('*', 'date_sub(date, day_of_week-1) as week_start')
df_b = df_b.selectExpr('*', 'date_add(date, 7-day_of_week) as week_end')

df_b.show()

+---+----------+-----------+----------+----------+
| ID|      date|day_of_week|week_start|  week_end|
+---+----------+-----------+----------+----------+
|  1|2020-07-13|          2|2020-07-12|2020-07-18|
+---+----------+-----------+----------+----------+

Update in Spark SQL

Crete a Temporary view from the data-frame first

df_a.createOrReplaceTempView("df_a_sql")

Code here

%sql
select *, date_sub(date,dayofweek-1) as week_start,
date_sub(date, 7-dayofweek) as week_end
from
(select *, dayofweek(date) as dayofweek
from df_a_sql) T

Output

+---+----------+-----------+----------+----------+
| ID|      date|day_of_week|week_start|  week_end|
+---+----------+-----------+----------+----------+
|  1|2020-07-13|          2|2020-07-12|2020-07-18|
+---+----------+-----------+----------+----------+
dsk
  • 1,863
  • 2
  • 10
  • 13
  • Can this be implemented in a sql – ben Jul 16 '20 at 20:36
  • Are you using spark-sql ? Then we can do it .. add more information - like what is IDE/ Language you are using ? – dsk Jul 17 '20 at 08:11
  • yeah I am using spark sql. but how does langugae plays a part here. Its just SQL. a query – ben Jul 17 '20 at 16:05
  • Sorry for the delayed response, I have updated my answer with spark-sql code , please check, This is a normal mySql functions, so in other SQL also , same logic should work – dsk Jul 21 '20 at 05:29
2

Perhaps this is helpful -

Load the test data

   val df = spark.sql("select cast('2020-07-12' as date) as date")
    df.show(false)
    df.printSchema()

    /**
      * +----------+
      * |date      |
      * +----------+
      * |2020-07-15|
      * +----------+
      *
      * root
      * |-- date: date (nullable = true)
      */

week starting from SUNDAY and ending SATURDAY


    // week starting from SUNDAY and ending SATURDAY
    df.withColumn("week_end", next_day($"date", "SAT"))
      .withColumn("week_start", date_sub($"week_end", 6))
      .show(false)

    /**
      * +----------+----------+----------+
      * |date      |week_end  |week_start|
      * +----------+----------+----------+
      * |2020-07-12|2020-07-18|2020-07-12|
      * +----------+----------+----------+
      */

week starting from MONDAY and ending SUNDAY


    // week starting from MONDAY and ending SUNDAY
    df.withColumn("week_end", next_day($"date", "SUN"))
      .withColumn("week_start", date_sub($"week_end", 6))
      .show(false)

    /**
      * +----------+----------+----------+
      * |date      |week_end  |week_start|
      * +----------+----------+----------+
      * |2020-07-12|2020-07-19|2020-07-13|
      * +----------+----------+----------+
      */

week starting from TUESDAY and ending MONDAY

    // week starting from TUESDAY and ending MONDAY
    df.withColumn("week_end", next_day($"date", "MON"))
      .withColumn("week_start", date_sub($"week_end", 6))
      .show(false)

    /**
      * +----------+----------+----------+
      * |date      |week_end  |week_start|
      * +----------+----------+----------+
      * |2020-07-12|2020-07-13|2020-07-07|
      * +----------+----------+----------+
      */
Som
  • 6,193
  • 1
  • 11
  • 22
  • Seems something is wrong. if date is 2020-07-12 then the week_start should be 2020-07-12 and week_end should be 2020-07-18, but i am getting 2020-07-05 and 2020-07-11. – ben Jul 15 '20 at 12:04
  • haven't tested these corner cases. thanks for notifying. – Som Jul 15 '20 at 12:30
  • @ben, Please check update, I think this is more generic and can be used for other week starting from any `DAY` – Som Jul 15 '20 at 15:41
  • @downvoters, can you please check the update. Also please let me know if its not working for you. – Som Jul 16 '20 at 01:32
  • you answer still does not work . Try out date '2016-06-25'. – ben Jul 16 '20 at 19:40
  • 1
    I was able to get this to work for week starting on Sunday by changing "SAT" to "SUN". So i get the next sunday, then subtract 7 days from it to get the sunday at the start of the week... e.g... ` // week starting from SUNDAY and ending SATURDAY df.withColumn("week_end", next_day($"date", "SUN")) .withColumn("week_start", date_sub($"week_end", 6)) .show(false)` – BradP Feb 15 '22 at 23:57
0

Find out the start date and end date of week in pyspark dataframe. Monday being the first day of week.

def add_start_end_week(dataframe, timestamp_col, StartDate, EndDate):
""""
Function:
    Get the start date and the end date of week
args
    dataframe: spark dataframe
    column_name: timestamp column based on which we have to calculate the start date and end date
    StartDate: start date column name of week
    EndDate: end date column name of week
"""
dataframe = dataframe.withColumn(
    'day_of_week', dayofweek(col(timestamp_col)))
# start of the week (Monday as first day)
dataframe = dataframe.withColumn('StartDate',when(col("day_of_week")>1, \
                                                  expr("date_add(date_sub({},day_of_week-1),1)".format(timestamp_col))). \
                                                  otherwise(expr("date_sub({},6)".format(timestamp_col))))
#End of the Week
dataframe = dataframe.withColumn('EndDate',when(col("day_of_week")>1, \
                                                expr("date_add(date_add({},7-day_of_week),1)".format(timestamp_col))). \
                                                otherwise(col("{}".format(timestamp_col))))

return dataframe

Validate the above function:

df = spark.createDataFrame([('2021-09-26',),('2021-09-25',),('2021-09-24',),('2021-09-23',),('2021-09-22',),('2021-09-21',),('2021-09-20',)], ['dt'])
dataframe = df.withColumn('day_of_week', dayofweek(col('dt')))
# start of the week (Monday as first day)
dataframe = dataframe.withColumn('StartDate',when(col("day_of_week")>1,expr("date_add(date_sub(dt,day_of_week-1),1)")).otherwise(expr("date_sub(dt,6)")))
#End of the Week
dataframe = dataframe.withColumn('EndDate',when(col("day_of_week")>1,expr("date_add(date_add(dt,7-day_of_week),1)")).otherwise(col("dt")))
Saurabh Kukreti
  • 119
  • 1
  • 6