3

Input dataframe(Sample)

Date    Location    Value
01-01-2020  Loc1    27.2
02-01-2020  Loc1    41.9
03-01-2020  Loc1    29.8
04-01-2020  Loc1    7.8
05-01-2020  Loc1    44
06-01-2020  Loc1    0.4
07-01-2020  Loc1    0.8
08-01-2020  Loc1    4.1
09-01-2020  Loc1    4
10-01-2020  Loc1    6.2
11-01-2020  Loc1    54.5
12-01-2020  Loc1    24.8
13-01-2020  Loc1    0
.
.
.
.
01-01-2020  Loc2    6
02-01-2020  Loc2    40.2
03-01-2020  Loc2    2.6
04-01-2020  Loc2    10.2
05-01-2020  Loc2    12
06-01-2020  Loc2    3.2
07-01-2020  Loc2    0
08-01-2020  Loc2    2.4
09-01-2020  Loc2    0
10-01-2020  Loc2    1.2
11-01-2020  Loc2    19.2
12-01-2020  Loc2    21.8
13-01-2020  Loc2    13.6
....

I want to add another column 'Rating', populated using the below logic

Rating  Condition
1       Less than 150 days of data

2       150 to 200 days

3       200 to 250 days

4       250 to 300 days

5       All 365 days

Suppose Loc1 has data for 180 days, thus a rating of 2 for each row, similarly Loc2 has data for all 360 days and hence a rating of 5. So, the output data will look like

Date    Location    Value   Rating
01-01-2021  Loc1    27.2    2
02-01-2021  Loc1    41.9    2
03-01-2021  Loc1    29.8    2
04-01-2021  Loc1    7.8     2
05-01-2021  Loc1    44      2
06-01-2021  Loc1    0.4     2
07-01-2021  Loc1    0.8     2
08-01-2021  Loc1    4.1     2
09-01-2021  Loc1    4       2
10-01-2021  Loc1    6.2     2
11-01-2021  Loc1    54.5    2
12-01-2021  Loc1    24.8    2
13-01-2021  Loc1    0       2
.
.
.
.
01-01-2021  Loc2    6       5
02-01-2021  Loc2    40.2    5
03-01-2021  Loc2    2.6     5
04-01-2021  Loc2    10.2    5
05-01-2021  Loc2    12      5
06-01-2021  Loc2    3.2     5
07-01-2021  Loc2    0       5
08-01-2021  Loc2    2.4     5
09-01-2021  Loc2    0       5
10-01-2021  Loc2    1.2     5
11-01-2021  Loc2    19.2    5
12-01-2021  Loc2    21.8    5
13-01-2021  Loc2    13.6    5
.
.

Note: Date Column is a datetime obj.

I want to do this for the entire dataframe, how can I achieve this?

RoshanShah22
  • 400
  • 1
  • 3
  • 16

2 Answers2

3

You can find difference by maximal and minimal datetimes per groups, convert timedeltas to days and then use cut for binning:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

df['Days'] = df.groupby('Location')['Date'].transform(lambda x: x.max() - x.min()).dt.days

df['Rating'] = pd.cut(df['Days'], bins=[0, 150, 200, 250, 367], labels=False).add(1)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

You can do it by using groupby with transform and later applying the conditions with np.where :

df['Rating']=df.groupby('Location')['Date'].transform('count')
df['Rating']=np.where(df['Rating']<150,1,np.where(df['Rating']<200,2,np.where(df['Rating']<250,3,np.where(df['Rating']<300,4,np.where(df['Rating']==300,5,'')))))
Suhas Mucherla
  • 1,383
  • 1
  • 5
  • 17