0

Suppose I have the following Pandas DataFrame. I want to compute the time (in seconds) since the last observation of each ip. Notice that the data is not necessarily ordered.

dict = {'ip':[123, 326, 123, 326], 'hour': [14, 12, 12, 1], 'minute': [54, 23, 41, 8], 'second': [45, 29, 19, 33]}

df = pd.DataFrame(dict, columns = dict.keys())

       ip  hour  minute  second
0  123    14      54      45
1  326    12      23      29
2  123    12      41      19
3  326     1       8      33

For example, I would like to add a column on the first entry saying that when ip 123 was captured by the second time, the equivalent in seconds of (14:54:45 - 12:41:19) had been elapsed since the last appearence in the dataset.

I am trying something with groupby but with no success. Any ideas?

Thanks in advance!!!

Raul Guarini Riva
  • 651
  • 1
  • 10
  • 20
  • Look here ("divmod answer"): https://stackoverflow.com/questions/1345827/how-do-i-find-the-time-difference-between-two-datetime-objects-in-python – Mika72 May 06 '18 at 15:45

2 Answers2

1

You can convert your hour,min,sec column to date time for may by using to_datetime, then we groupby and get the different (diff)

df['Time']=pd.to_datetime(df.iloc[:,1:].astype(str).apply(''.join,1),format='%H%M%S')

df['Yourneed']=df.groupby('ip').Time.diff().dt.total_seconds()
df
    ip  hour  minute  second                Time  Yourneed
0  123    14      54      45 1900-01-01 14:54:45       NaN
1  326    12      23      29 1900-01-01 12:23:29       NaN
2  123    12      41      19 1900-01-01 12:41:19   -8006.0
3  326     1       8      33 1900-01-01 18:03:03   20374.0
BENY
  • 317,841
  • 20
  • 164
  • 234
0

You were close with the groupby. Creating a proper datetime column was probably the missing piece:

from datetime import datetime
import pandas

def row_to_date(row):
    today = datetime.today()
    return datetime(
        today.year,
        today.month,
        today.day,
        row['hour'],
        row['minute'],
        row['second']
    )


data = {
    'ip':[123, 326, 123, 326],
    'hour': [14, 12, 12, 1],
    'minute': [54, 23, 41, 8],
    'second': [45, 29, 19, 33]
}


df = (
    pandas.DataFrame(data)
        .assign(date=lambda df: df.apply(row_to_date, axis=1))
        .groupby(by=['ip'])
        .apply(lambda g: g.diff()['date'].dt.total_seconds())
        .dropna()
        .to_frame('elapsed_seconds')
        .reset_index(level=1, drop=True)
)
df

And so I get:

     elapsed_seconds
ip                  
123          -8006.0
326         -40496.0
Paul H
  • 65,268
  • 20
  • 159
  • 136