1

I need to calculate the number of consecutive days from today (2022-01-04) backwards a client logged in my application. I need to use pyspark due to the size of my database

Input

Name       Date         
    John    2022-01-01  
    John    2022-01-01
    Mary    2022-01-01
    Steve   2022-01-03
    Mary    2022-01-03
    John    2022-01-02
    John    2022-01-03
    Mary    2022-01-04
    John    2022-01-04

Output

Name       consecutive_days          
    John         4
    Mary         2 
karek77
  • 31
  • 3
  • what should happen in case a name has 2 consecutive days then a break then again 2 consecutive days? would hat be counted as 4 consecutive days against that name? – samkart Aug 05 '22 at 06:13

2 Answers2

1

You can calculate the number of days between to dates using datediff and lag. Then count the dates that have the difference as 1. Add a 1 to that to get your number of consecutive days.

data_sdf. \
    select('name', 'dt'). \
    dropDuplicates(). \
    withColumn('gap', 
               func.datediff('dt', func.lag('dt').over(wd.partitionBy('name').orderBy('dt')))
               ). \
    filter(func.col('gap') == 1). \
    groupBy('name'). \
    agg((func.sum('gap') + 1).alias('consecutive_days')). \
    show()

# +----+----------------+
# |name|consecutive_days|
# +----+----------------+
# |Mary|               2|
# |John|               4|
# +----+----------------+
samkart
  • 6,007
  • 2
  • 14
  • 29
-1

Maybe you could do something like this,

from datetime import date

d1 = date(2022, 2, 6)
d2 = date(2022, 2, 16)
X = d2 - d1
print("Consecutive Days = ",X.days)
Mark Jacobs
  • 149
  • 1
  • 1
  • 7
  • This wouldn't give consecutive days. – Pranav Hosangadi Aug 04 '22 at 21:16
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 06 '22 at 05:21