0

So I have a set of 50 dates I have specified 7 here for example

df["CreatedDate"] = pd.DataFrame('09-08-16 0:00','22-08-16 0:00','23-08-16 0:00',28-08-16 0:00,'29-08-16 0:00','30-08-16 0:00','31-08-16 0:00')
df["CreatedDate"] = pd.to_datetime(df4.CreatedDate)
df4["DAY"] = df4.CreatedDate.dt.day

How to find the continuous days which form a streak range [1-3],[4-7],[8-15],[>=16]

 Streak Count 
 1-3     3    #(9),(22,23) are in range [1-3]
 4-7     1    #(28,29,30,31) are in range [4-7]
 8-15    0
 >=16    0

let's just say the product (pen) has been launched 2 yrs back we are taking the dataset for last 10 months from today and from what I want to find is that if people are buying that pen continuously for 1 or 2 or 3 days and if yes place the count [1-3] and if they are buying it continuously for 4 or 5 or 6 or 7 days we place the count in [4- 7] and so on for other ranges

I dont know which condition to specify to match the criteria

san
  • 173
  • 6
  • how is this different from your other [question](https://stackoverflow.com/questions/50617233/adding-specific-days-in-python-table) which already has a nice answer that does this among other things? – filippo Jun 01 '18 at 07:20
  • I didn't get an answer yet. – san Jun 01 '18 at 07:23
  • do you want the days since the oldest timestamp? it wasn't that clear in the other question too. I mean you want to group your data in streaks since the first day the product was on sale? I don't understand how `9,22,23` are in the same group – filippo Jun 01 '18 at 07:29
  • no let's just say the product (pen) has been launched 2 yrs back we are taking the dataset for last 10 months from today and from what I want to find is that if people are buying that pen continuously for 1 or 2 or 3 days and if yes place the count [1-3] and if they are buying it continuously for 4 or 5 or 6 or 7 days we place the count in [4- 7] and so on for other ranges – san Jun 01 '18 at 07:37
  • In 9(only 1 day) the products are sold, after 9 there is no continuous days and a 1-day streak is in between the range [1-3]. In 22 and 23 the products are being sold(2 days streak) which is also in the range [1-3] – san Jun 01 '18 at 08:04
  • thanks, much clearer now – filippo Jun 01 '18 at 08:04
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/172205/discussion-between-san-and-filippo). – san Jun 01 '18 at 08:09

2 Answers2

1

I believe need:

df4 = pd.DataFrame({'CreatedDate':['09-08-16 0:00','22-08-16 0:00','23-08-16 0:00','28-08-16 0:00','29-08-16 0:00','30-08-16 0:00','31-08-16 0:00']})
df4["CreatedDate"] = pd.to_datetime(df4.CreatedDate)

df4 = df4.sort_values("CreatedDate")
count = df4.groupby((df4["CreatedDate"].diff().dt.days > 1).cumsum()).size()
print (count)
CreatedDate
0    2
1    4
2    1
dtype: int64

a = (pd.cut(count, bins=[0,3,7,15,31], labels=['1-3', '4-7','8-15', '>=16'])
       .value_counts()
       .sort_index()
       .rename_axis('Streak')
       .reset_index(name='Count'))
print (a)
  Streak  Count
0    1-3      2
1    4-7      1
2   8-15      0
3   >=16      0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • https://stackoverflow.com/questions/50621605/how-to-add-the-values-for-specific-days-in-python-table-for-a-given-range can you help me with this? – san Jun 01 '18 at 10:02
0

Here's an attempt, binning is the same as @jezrael (except the last bin which I'm not sure should be limited to 31... is there a way to have open intervals with pd.cut?)

import pandas as pd

df = pd.DataFrame({ "CreatedDate": ['09-08-16 0:00','22-08-16 0:00','23-08-16 0:00','28-08-16 0:00','29-08-16 0:00','30-08-16 0:00','31-08-16 0:00']})
df["CreatedDate"] = pd.to_datetime(df.CreatedDate)

# sort by date
df = df.sort_values("CreatedDate")

# group consecutive dates
oneday = pd.Timedelta("1 day")
df["groups"] = (df.diff() > oneday).cumsum()
counts = df.groupby("groups").count()["CreatedDate"]

# bin
streaks = (pd.cut(counts, bins=[0,3,7,15,1000000], labels=['1-3', '4-7','8-15', '>=16'])
           .value_counts()
           .rename_axis("streak")
           .reset_index(name="count"))

print(streaks)

  streak  count
0    1-3      2
1    4-7      1
2   >=16      0
3   8-15      0
filippo
  • 5,197
  • 2
  • 21
  • 44