I have a data frame with search traffic and write a code to get the first and last request per day and the time difference between it.
df = pd.read_csv("Testordner2/"+i, parse_dates=True)
df['new_time'] = pd.to_datetime(df['new_time'])
df['dates']= df['new_time'].dt.date
df['time'] = df['new_time'].dt.time
out = df.groupby(df['dates']).agg({'time': ['min', 'max']}) \
.stack(level=0).droplevel(1)
out['min_as_time_format'] = pd.to_datetime(out['min'], format="%H:%M:%S")
out['max_as_time_format'] = pd.to_datetime(out['max'], format="%H:%M:%S")
out['wh'] = out['max_as_time_format'] - out['min_as_time_format']
out['wh'].astype(str).str[-18:-10]
This works well and I get a dataframe out
that looks like:
min max wh
dates
2005-09-06 07:41:18 21:59:57 14:18:39
2005-09-12 14:49:22 14:49:22 00:00:00
2005-09-19 11:08:56 11:24:05 00:15:09
2005-09-21 21:19:21 21:20:15 00:00:54
2005-09-22 19:41:52 19:41:52 00:00:00
2005-10-13 11:22:07 21:05:41 09:43:34
2005-11-22 11:53:12 21:21:22 09:28:10
2005-11-23 00:07:01 14:08:50 14:01:49
2005-11-30 13:42:48 23:59:19 10:16:31
2005-12-01 00:05:16 10:24:12 10:18:56
2005-12-21 17:38:43 19:26:03 01:47:20
2005-12-22 09:20:07 11:25:40 02:05:33
2006-01-23 07:46:20 08:01:52 00:15:32
2006-04-27 16:27:54 19:29:52 03:01:58
2006-05-11 12:48:34 23:10:44 10:22:10
2006-05-15 10:14:59 22:28:12 12:13:13
2006-05-16 01:14:07 23:55:51 22:41:44
2006-05-17 01:12:45 23:57:56 22:45:11
2006-05-18 02:42:08 21:48:49 19:06:41
2006-05-22 00:00:29 23:07:12 23:06:43
2006-05-23 02:14:55 22:35:04 20:20:09
2006-05-24 11:53:08 21:25:39 09:32:31
2006-05-25 01:20:38 22:14:55 20:54:17
2006-05-29 01:34:09 23:53:33 22:19:24
The problem is I need a column dates
in my dataframe out
but this does not exist. I don't know why the column name "dates" is not in the same height as the other column names "min", "max" and "wh"..
I never had this problem before by using groupby but never use the egg
function before. Do not know if this is a reason for the problem ..
And the second question: I want to build the mean of the workhours in wh
per month.
I use:
out['month']= pd.PeriodIndex(out.dates, freq='M')
out2=out.groupby('month')['wh'].mean().reset_index(name='wh2')
But the values in wh
are no numeric data, so I can't use mean
.
How can I convert the whole column?