2

I have a data frame with search traffic and write a code to get the first and last request per day and the time difference between it.

df = pd.read_csv("Testordner2/"+i, parse_dates=True)
df['new_time'] = pd.to_datetime(df['new_time'])
df['dates']= df['new_time'].dt.date
df['time'] = df['new_time'].dt.time
out = df.groupby(df['dates']).agg({'time': ['min', 'max']}) \
                                 .stack(level=0).droplevel(1)
out['min_as_time_format'] = pd.to_datetime(out['min'], format="%H:%M:%S")
out['max_as_time_format'] = pd.to_datetime(out['max'], format="%H:%M:%S")
out['wh'] = out['max_as_time_format'] - out['min_as_time_format']
out['wh'].astype(str).str[-18:-10]

This works well and I get a dataframe out that looks like:

                 min       max       wh
dates                                  
2005-09-06  07:41:18  21:59:57 14:18:39
2005-09-12  14:49:22  14:49:22 00:00:00
2005-09-19  11:08:56  11:24:05 00:15:09
2005-09-21  21:19:21  21:20:15 00:00:54
2005-09-22  19:41:52  19:41:52 00:00:00
2005-10-13  11:22:07  21:05:41 09:43:34
2005-11-22  11:53:12  21:21:22 09:28:10
2005-11-23  00:07:01  14:08:50 14:01:49
2005-11-30  13:42:48  23:59:19 10:16:31
2005-12-01  00:05:16  10:24:12 10:18:56
2005-12-21  17:38:43  19:26:03 01:47:20
2005-12-22  09:20:07  11:25:40 02:05:33
2006-01-23  07:46:20  08:01:52 00:15:32
2006-04-27  16:27:54  19:29:52 03:01:58
2006-05-11  12:48:34  23:10:44 10:22:10
2006-05-15  10:14:59  22:28:12 12:13:13
2006-05-16  01:14:07  23:55:51 22:41:44
2006-05-17  01:12:45  23:57:56 22:45:11
2006-05-18  02:42:08  21:48:49 19:06:41
2006-05-22  00:00:29  23:07:12 23:06:43
2006-05-23  02:14:55  22:35:04 20:20:09
2006-05-24  11:53:08  21:25:39 09:32:31
2006-05-25  01:20:38  22:14:55 20:54:17
2006-05-29  01:34:09  23:53:33 22:19:24

The problem is I need a column dates in my dataframe out but this does not exist. I don't know why the column name "dates" is not in the same height as the other column names "min", "max" and "wh".. I never had this problem before by using groupby but never use the eggfunction before. Do not know if this is a reason for the problem ..

And the second question: I want to build the mean of the workhours in wh per month. I use:

out['month']= pd.PeriodIndex(out.dates, freq='M')
out2=out.groupby('month')['wh'].mean().reset_index(name='wh2')

But the values in wh are no numeric data, so I can't use mean. How can I convert the whole column?

Melli
  • 65
  • 6
  • `df = df.reset_index()` allows you to have it back – nidabdella Aug 28 '19 at 11:17
  • Thank you! I edited my post because I had a second question and has to wait 90 min to do my next question. Can you also help me with this problem ? – Melli Aug 28 '19 at 11:52
  • Welcome to Stack Overflow! You're more likely to get an answer to your second question if you post it separately, as a new question. Just gotta wait a few more minutes... – jkdev Aug 28 '19 at 12:20

1 Answers1

2

Simply reset the index:

out.reset_index(inplace=True)

Groupby and Average Datetime:

grouper = out.groupby('month')
int_mean = grouper['wh'].apply(lambda x: x.astype(int).mean())
mean = int_mean.apply(pd.to_datetime)
Ted
  • 1,189
  • 8
  • 15
  • Thank you! I edited my post because I had a second question and has to wait 90 min to do my next question. Can you also help me with this problem ? – Melli Aug 28 '19 at 11:38
  • @Melli No problem, happy to help. Sure, see latest edit. – Ted Aug 29 '19 at 09:31