1

Assuming the following dataset has sorted list of dates:

dates=pd.DataFrame(data={'client':['1','2'],
                         'date':[['2012-3-10','2012-3-11','2012-3-12','2012-3-13','2012-3-14'],
                                 ['2012-3-12','2012-3-13','2012-3-16','2012-3-23']]})

sample dataset

I want to find the average date difference in terms of days So, For eg, for Client '2', Average Timelag would be 2.75

Stefan
  • 41,759
  • 13
  • 76
  • 81
pheno
  • 437
  • 1
  • 4
  • 13

2 Answers2

2

Starting with:

  client                                               date
0      1  [2012-3-10, 2012-3-11, 2012-3-12, 2012-3-13, 2...
1      2       [2012-3-12, 2012-3-13, 2012-3-16, 2012-3-23]

You could

dates.groupby('client')['date'].apply(lambda x: [i / np.timedelta64(1, 'D') for i in np.diff([pd.to_datetime(c) for c in x])[0]])

to get the timedelta in days:

client
1    [1.0, 1.0, 1.0, 1.0]
2         [1.0, 3.0, 7.0]

or

dates.groupby('client')['date'].apply(lambda x: np.mean([i / np.timedelta64(1, 'D') for i in np.diff([pd.to_datetime(c) for c in x])[0]]))

for the mean:

client
1    1.000000
2    3.666667
Stefan
  • 41,759
  • 13
  • 76
  • 81
0

This is a repeat of:

Difference between two dates?

It looks like you can use the datetime modules to parse the date and/or time strings you're importing into objects which support algebra.

https://docs.python.org/2/library/datetime.html

Cheers

Community
  • 1
  • 1
PetMetz
  • 75
  • 1
  • 7
  • My primary problem is not finding the difference in the dates. It's more about parsing through the list of values – pheno Feb 03 '16 at 02:42