1

I have a dataset of userids and the all the times they use a particular pass. I need to find out how many days since each of them first used the pass. I was thinking of running through the dataset and store the first use in a dictionary and minus it off today's date. I cant seem to get it to work.

Userid Start use Day
1712 2019-01-04 Friday
1712 2019-01-05 Saturday
9050 2019-01-04 Friday
9050 2019-01-04 Friday
9050 2019-01-06 Sunday
9409 2019-01-05 Saturday
9683 2019-05-20 Monday
8800 2019-05-17 Friday
8800 2019-05-17 Friday

This is the part of the dataset. Date format is Y-m-d

usedict={}  
keys = df.user_id  
values = df.start_date  
for i in keys:  
    if (usedict[i] == keys):  
      continue   
else:  
    usedict[i] = values[i]  
prints(usedict)  

user_id use_count days_used Ave Daily Trips register_date days_since_reg
12 42 23 1.826087 NaT NaT
17 28 13 2.153846 NaT NaT
114 54 24 2.250000 2019-02-04 107 days
169 31 17 1.823529 NaT NaT
1414 49 20 2.450000 NaT NaT
1712 76 34 2.235294 NaT NaT
2388 24 12 2.000000 NaT NaT
6150 10 5 2.000000 2019-02-05 106 days

abdillah
  • 61
  • 3
  • find the min start for each id? – QHarr May 23 '19 at 02:48
  • 1. Seems there're some indentation problem in your else branch. 2. you can assign value(date time) to each key(userid) in your `usedict` by adding a condition check if's the earliest date just like what bubble sorting does – HzCheng May 23 '19 at 03:10

2 Answers2

0

I am only looking at two columns but you could find the min for each id with groupby and then use apply to get difference (I have done difference in days)

import pandas as pd
import datetime

user_id = [1712, 1712, 9050, 9050, 9050, 9409, 9683, 8800, 8800]
start = ['2019-01-04', '2019-01-05', '2019-01-04', '2019-01-04', '2019-01-06', '2019-01-05', '2019-05-20', '2019-05-17', '2019-05-17']

df = pd.DataFrame(list(zip(user_id, start)), columns = ['UserId', 'Start'])
df['Start']= pd.to_datetime(df['Start']) 
df = df.groupby('UserId')['Start'].agg([pd.np.min])
now = datetime.datetime.now()
df['days'] = df['amin'].apply(lambda x: (now - x).days)
a_dict = pd.Series(df.days.values,index = df.index).to_dict()
print(a_dict)

References:

to_dict() method taken from @jeff


Output:

enter image description here

QHarr
  • 83,427
  • 12
  • 54
  • 101
0

You can achieve what you want with the following. I have used only 2 user ids from the example given by you, but the same will apply to all.

import pandas as pd
import datetime

df = pd.DataFrame([{'Userid':'1712','use_date':'2019-01-04'},
                    {'Userid':'1712','use_date':'2019-01-05'},
                    {'Userid':'9050','use_date':'2019-01-04'},
                    {'Userid':'9050','use_date':'2019-01-04'},
                    {'Userid':'9050','use_date':'2019-01-06'}])

df.use_date = pd.to_datetime(df.use_date).dt.date
group_df = df.sort_values(by='use_date').groupby('Userid', as_index=False).agg({'use_date':'first'}).rename(columns={'use_date':'first_use_date'})

group_df['diff_from_today'] = datetime.datetime.today().date() - group_df.first_use_date

The output is:

print(group_df)
  Userid first_use_date diff_from_today
0   1712     2019-01-04        139 days
1   9050     2019-01-04        139 days

Check sort_values and groupby for more details.

WebDev
  • 1,211
  • 11
  • 17