0

I have a two datetime columns in a pandas dataframe and want to calculate the difference between them to see how long people have been online (so I have log in and log out datetime columns). I have entries for 24hours each day but only want to sum up the time difference for the time the user has been online between 8am and 12am.

So I want to set all entries of the login column to 8am if the entries are between 12am and 8am, and all entries of the logout column to 12am if the entries were made between 12am and 8am.

How do I only check for the time in a datetime column and then set it accordingly?

Brit
  • 1
  • 1
    https://stackoverflow.com/questions/16138744/extract-time-from-datetime-and-determine-if-time-not-date-falls-within-range can help – Sherar MDP Sep 07 '21 at 01:18

1 Answers1

1

First, I do not know whether your time data has been in datatime object, or still in string. If you have not, you could convert from string to datetime by the following code (suppose that your first column name 'time_in' and second column name 'time_out', and the data variable name data)

from datetime import datetime
data['time_in'] = data['time_in'].apply(lambda x: datetime.strptime(x, "%I%p"))
data['time_out'] = data['time_out'].apply(lambda x: datetime.strptime(x, "%I%p"))

(one note: as you give me your time at Hour_locale like 8am, so I do not include minute in converter. If you do have minute, then "%I%p" should be changed into "%I:M%p"). You could view all time format to convert here, for any future usage https://www.programiz.com/python-programming/datetime/strptime

When you print out your data at this stage, you could see the time will have the format of 1900/01/01 18:00:00 (convert from 6pm, for example). Do not worry, as you see this simply because when converting, the library do not receive date, so it automatically assigned to the first one. Just remind that for next step.

Now, you apply the changing to dataframe, simply as this: To change all login to 8am:

data.loc[(data['time_in']< datetime(year=1900,month=1,day=1,hour=8)) & (data['time_in'] > datetime(year=1900,month=1,day=1,hour=0)),'time_in'] = datetime(year=1900,month=1,day=1,hour=8)

To change all logout to 12am:

data.loc[(data['time_out']< datetime(year=1900,month=1,day=1,hour=8)) & (data['time_out'] > datetime(year=1900,month=1,day=1,hour=0)),'time_out'] = datetime(year=1900,month=1,day=1,hour=0)

Then, it will all set at this stage. If you want to convert back to string, using strftime() with similar usage to strptime

Long Doan
  • 303
  • 3
  • 10
  • Thank you, but I can't seem to get it to work. the type is already datetime, but I'm not able to only change the time in the timestamp depending on the condition. My dataframe looks like this: online_time_new offline_time_new 0 2021-09-01 15:51:00 2021-09-01 16:40:00 1 2021-09-02 22:44:00 2021-09-02 22:49:00 2 2021-08-30 03:56:00 2021-08-30 04:14:00 3 2021-08-30 05:36:00 2021-08-30 11:23:00 4 2021-09-02 07:26:00 2021-09-02 11:24:00 – Brit Sep 08 '21 at 12:26
  • Oh, so your data include year month date, so I believe at first you might want to create a new column with hour only (like data['online_time_hour'] = data['online_time_new'].apply(lambda x: datetime.change(year=1900, month=1, day=1)) (do the same with other column) to convert it first. After applying the condition, you could use the same technique to have a new data time. – Long Doan Sep 08 '21 at 14:37