1

I am preprocessing apache server log data. I have 3 columns ID, TIME, and BYTES. Example:

ID &nbsp &nbsp TIME &nbsp &nbsp BYTES

1 &nbsp &nbsp 13:00 &nbsp &nbsp 10

2 &nbsp &nbsp 13:02 &nbsp &nbsp 30

3 &nbsp &nbsp 13:03 &nbsp &nbsp 40

4 &nbsp &nbsp 13:02 &nbsp &nbsp 50

5 &nbsp &nbsp 13:03 &nbsp &nbsp 70

I want to achieve something like this:

ID &nbsp &nbsp TIME &nbsp &nbsp BYTES

1 &nbsp &nbsp 13:00 &nbsp &nbsp 10

2 &nbsp &nbsp 13:02 &nbsp &nbsp 80

3 &nbsp &nbsp 13:03 &nbsp &nbsp 110

  • `df.groupby('TIME', as_index=False).agg({'ID': 'min', 'BYTES': 'sum'})` would do. – Zero Dec 28 '17 at 19:09
  • Indeed. @Zero, can you find a dupe? – cs95 Dec 28 '17 at 19:11
  • It messed up the time. The time now starts from 0:00 in some bizarre pattern. In my case, the ID doesn't really matter. So, it's just TIME and BYTES. I want the output to be like I've shown(because i'll plot a graph against TIME). The output must be in the order of TIME as shown. What changes do you suggest, @Zero? –  Dec 28 '17 at 19:23
  • What is the dtype of TIME? – Scott Boston Dec 28 '17 at 19:24
  • `df.groupby('TIME')[['BYTES']].sum().plot()`? – cs95 Dec 28 '17 at 19:25
  • I ran this `server_logs["TIME"] = pd.to_datetime(server_logs["TIME"]) ` to convert time. `dtype` of TIME is `datetime64[ns]` @ScottBoston –  Dec 28 '17 at 19:31
  • @cᴏʟᴅsᴘᴇᴇᴅ, newbie here. I'm trying to find the reply in bytes at all unique times. Can you please explain what exactly does this `df.groupby('TIME')[['BYTES']].sum().plot()` do? –  Dec 28 '17 at 19:36
  • Does it work? If it doesn't, explaining it is pointless. :p – cs95 Dec 28 '17 at 19:37
  • What it does is group by unique timestamps, find the sum of BYTES for each timestamp, leaving the result with TIME as the index. After, `plot` will plot BYTES with respect to TIME. If you want to see the plot, call `plt.show()`. – cs95 Dec 28 '17 at 19:38
  • If you're still getting weird results, convert TIME to a string first - `df.assign(TIME=df.TIME.astype(str).groupby('TIME')[['BYTES']].sum().plot()`. – cs95 Dec 28 '17 at 19:39
  • It has been done. See my other two comments. – cs95 Dec 28 '17 at 19:44
  • @cᴏʟᴅsᴘᴇᴇᴅ it works. :p –  Dec 28 '17 at 19:44
  • This seemed like a dupe, and your question has been solved. Cheers! – cs95 Dec 28 '17 at 19:45
  • The time got weird. @cᴏʟᴅsᴘᴇᴇᴅ , `df.assign(TIME=df.TIME.astype(str).groupby('TIME')[['BYTES']‌​].sum().plot()` this gave out an error. ` ax = unique_time_bytes.assign(TIME=df.TIME.astype(str).groupby('TIME')[['REPLY_IN_BYTES']‌​].sum()SyntaxError: invalid character in identifier` –  Dec 28 '17 at 20:34
  • Copying the code from comments is a problem. Just type it out. – cs95 Dec 28 '17 at 20:50
  • But it was working earlier(without converting TIME to string). Your answer worked too. All of a sudden, it went back to weird TIME formatting. –  Dec 28 '17 at 20:52
  • Sounding like a data issue to me. – Scott Boston Dec 28 '17 at 21:00

1 Answers1

0

Let's try:

df['TIME'] = pd.to_datetime(df['TIME'])
ax = df.groupby('TIME')['BYTES'].sum().plot()
ax.set_xlim('13:00:00','13:03:00')

Output:

enter image description here

Scott Boston
  • 147,308
  • 15
  • 139
  • 187