1

I have two panda series, closedDate and createdDate, thats elements are pandas timestamps, class 'pandas._libs.tslib.Timestampclass 'pandas._libs.tslib.Timestamp.

I have subtracted those two panda series to make a list, age, of pandas timedelta.

closedDate = data.iloc[:,1]
createdDate = data.iloc[:,2]
age = [x-y for x,y in zip(closedDate, createdDate)]

Now, I want to get the average of age but with my line of code I get an error.

In: average_age = functools.reduce(lambda x, y: x + y, age) / len(age)

Out: OverflowError: int too big to convert

How can I fix this??

Thanks!

jpp
  • 159,742
  • 34
  • 281
  • 339
cool_beans
  • 131
  • 1
  • 5
  • 15

2 Answers2

2

You can and should aim to use vectorised functions for this task.

In this example, you can subtract one pd.Series from another. You can then use the mean method to calculate the average.

data = pd.DataFrame({'createdDate': [pd.Timestamp('2018-01-01'),
                                     pd.Timestamp('2018-05-01')],
                     'closedDate': [pd.Timestamp('2018-02-01'),
                                    pd.Timestamp('2018-07-01')]})

closedDate = data['closedDate']
createdDate = data['createdDate']

ages = closedDate - createdDate

print(ages)

# 0   31 days
# 1   61 days
# dtype: timedelta64[ns]

res = ages.mean()

print(res)

# 46 days 00:00:00

There are 2 main reasons why using vectorised functions are better in this case:

  1. Underlying pd.Timestamp are numeric arrays (source).
  2. zip on arrays underperforms versus zip on lists (source).
jpp
  • 159,742
  • 34
  • 281
  • 339
0

You should do this with series like jpp suggested, but barring that, heres how you could do it with the list you provided.

average_age = sum(age, timedelta()) / len(age)
M Sandler
  • 11
  • 4