1

This was the original data.

ID &nbsp &nbsp TIME &nbsp &nbsp BYTES

1 &nbsp &nbsp 13:00 &nbsp &nbsp 10

2 &nbsp &nbsp 13:02 &nbsp &nbsp 30

3 &nbsp &nbsp 13:03 &nbsp &nbsp 40

4 &nbsp &nbsp 13:02 &nbsp &nbsp 50

5 &nbsp &nbsp 13:03 &nbsp &nbsp 70


I got the following data using ax = server_logs.groupby('TIME')['REPLY_SIZE'].sum().

ID &nbsp &nbsp TIME &nbsp &nbsp BYTES

1 &nbsp &nbsp 13:00 &nbsp &nbsp 10

2 &nbsp &nbsp 13:02 &nbsp &nbsp 80

3 &nbsp &nbsp 13:03 &nbsp &nbsp 110

How do I seperate the the TIME and BYTES into two different lists after doing ? It doesn't seem to separate using time = ax[0].

ps:I'd like to apply k means clustering using sklearn on this data after.

  • 1
    How about: `v = df.groupby('TIME')['BYTES'].sum(); a, b = v.index.tolist(), v.tolist()`? – cs95 Dec 29 '17 at 14:00
  • No, they’re timestamp objects. You can plot them without any issues. And yes, that’s what it does. Also, what’s wrong with using pd.Series.plot? Like I suggested in your last question? – cs95 Dec 29 '17 at 14:27
  • Yep. I plotted it. It works. I can't find your pd.Series.plot in my previous question. Can you please post it here? –  Dec 29 '17 at 14:45
  • Check here: https://stackoverflow.com/questions/48013007/group-duplicate-columns-and-sum-the-corresponding-column-values-using-pandas#comment82995023_48013007 – cs95 Dec 29 '17 at 14:48
  • You could've just done `df.groupby('TIME')['BYTES'].sum().plot()` :p – cs95 Dec 29 '17 at 14:49
  • I wanted a scatter plot. For scatter plot we need 2 input right? `plt.scatter(a,b)`. When I _plot_ a scatter plot, the x-axis(time) is very large as compared to where the actual data is. I'll put up picture. Why is that happening tho? –  Dec 29 '17 at 14:56
  • That seems to be a matplotlib issue, you can fix it by setting some parameter (I'm not an expert, so I have no clue which). Alternatively, pandas has a `plot.scatter` function that you can call. – cs95 Dec 29 '17 at 15:04
  • `v.plot.scatter()` doesn't work. Neither does `v.plot.scatter(a, b)`. Both give an error : `AttributeError: 'SeriesPlotMethods' object has no attribute 'scatter'`. –  Dec 29 '17 at 15:12

2 Answers2

1

The answer as given by @COLDSPEED.

v = df.groupby('TIME')['BYTES'].sum(); 
a, b = v.index.tolist(), v.tolist()
0
time=ax[:,0]
bytes=ax[:,1]

Can you try that?

If that doesn't work, then this should

time=ax["TIME"]
bytes=ax["BYTES"]
Srihari
  • 177
  • 1
  • 9
  • With `time = ax[:,0]`, I got the following error `Can only tuple-index with a MultiIndex`. With `time=ax["TIME"]`, I got this error: `Can only tuple-index with a MultiIndex`. –  Dec 29 '17 at 13:41