I need to calculate elapsed time between events. My task is similar to this one but I got an error when I try to reproduce it:
print (df1.sort_values(['ip','timestamp']).head(20))
df1['diff'] = df1.sort_values(['ip','timestamp']).groupby('ip')['timestamp'].diff()
ip timestamp
26422 1.0.150.87 2021-08-21 03:17:00
26192 1.0.150.87 2021-08-21 03:17:00
77885 1.0.155.191 2021-08-22 05:54:00
77387 1.0.155.191 2021-08-22 05:54:00
27240 1.0.227.92 2021-08-21 03:47:00
27009 1.0.227.92 2021-08-21 03:47:00
47641 1.10.130.122 2021-08-21 13:44:00
47279 1.10.130.122 2021-08-21 13:44:00
11912 1.10.202.23 2021-08-20 16:59:00
11825 1.10.202.23 2021-08-20 16:59:00
92 1.10.213.176 2021-08-20 12:02:00
96 1.10.213.176 2021-08-20 12:02:00
2580 1.10.213.176 2021-08-20 13:09:00
2572 1.10.213.176 2021-08-20 13:09:00
4518 1.10.213.176 2021-08-20 13:57:00
4491 1.10.213.176 2021-08-20 13:57:00
8057 1.10.214.251 2021-08-20 15:23:00
8017 1.10.214.251 2021-08-20 15:23:00
35302 1.10.219.41 2021-08-21 08:09:00
35030 1.10.219.41 2021-08-21 08:09:00
Traceback (most recent call last):
File "./analyser.py", line 59, in <module>
df1['diff'] = df1.sort_values(['ip','timestamp']).groupby('ip')['timestamp'].diff()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 3607, in __setitem__
self._set_item(key, value)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 3779, in _set_item
value = self._sanitize_column(value)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 4501, in _sanitize_column
return _reindex_for_setitem(value, self.index)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 10777, in _reindex_for_setitem
raise err
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 10772, in _reindex_for_setitem
reindexed_value = value.reindex(index)._values
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/series.py", line 4579, in reindex
return super().reindex(index=index, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py", line 4809, in reindex
return self._reindex_axes(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py", line 4830, in _reindex_axes
obj = obj._reindex_with_indexers(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/generic.py", line 4874, in _reindex_with_indexers
new_data = new_data.reindex_indexer(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 666, in reindex_indexer
self.axes[axis]._validate_can_reindex(indexer)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3785, in _validate_can_reindex
raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis
Can't figure why it is not working? Also I wonder if there is a better way to solve this, for example, using 'native' Python's functionality? Thank you for your help!