Seperate columns into list after applying df.groupby() in pandas

Question

This was the original data.

ID &nbsp &nbsp TIME &nbsp &nbsp BYTES

1 &nbsp &nbsp 13:00 &nbsp &nbsp 10

2 &nbsp &nbsp 13:02 &nbsp &nbsp 30

3 &nbsp &nbsp 13:03 &nbsp &nbsp 40

4 &nbsp &nbsp 13:02 &nbsp &nbsp 50

5 &nbsp &nbsp 13:03 &nbsp &nbsp 70

I got the following data using ax = server_logs.groupby('TIME')['REPLY_SIZE'].sum().

ID &nbsp &nbsp TIME &nbsp &nbsp BYTES

1 &nbsp &nbsp 13:00 &nbsp &nbsp 10

2 &nbsp &nbsp 13:02 &nbsp &nbsp 80

3 &nbsp &nbsp 13:03 &nbsp &nbsp 110

How do I seperate the the TIME and BYTES into two different lists after doing ? It doesn't seem to separate using time = ax[0].

ps:I'd like to apply k means clustering using sklearn on this data after.

How about: `v = df.groupby('TIME')['BYTES'].sum(); a, b = v.index.tolist(), v.tolist()`? — cs95, Dec 29 '17 at 14:00
No, they’re timestamp objects. You can plot them without any issues. And yes, that’s what it does. Also, what’s wrong with using pd.Series.plot? Like I suggested in your last question? — cs95, Dec 29 '17 at 14:27
Yep. I plotted it. It works. I can't find your pd.Series.plot in my previous question. Can you please post it here? — , Dec 29 '17 at 14:45
Check here: https://stackoverflow.com/questions/48013007/group-duplicate-columns-and-sum-the-corresponding-column-values-using-pandas#comment82995023_48013007 — cs95, Dec 29 '17 at 14:48
You could've just done `df.groupby('TIME')['BYTES'].sum().plot()` :p — cs95, Dec 29 '17 at 14:49
I wanted a scatter plot. For scatter plot we need 2 input right? `plt.scatter(a,b)`. When I _plot_ a scatter plot, the x-axis(time) is very large as compared to where the actual data is. I'll put up picture. Why is that happening tho? — , Dec 29 '17 at 14:56
That seems to be a matplotlib issue, you can fix it by setting some parameter (I'm not an expert, so I have no clue which). Alternatively, pandas has a `plot.scatter` function that you can call. — cs95, Dec 29 '17 at 15:04
`v.plot.scatter()` doesn't work. Neither does `v.plot.scatter(a, b)`. Both give an error : `AttributeError: 'SeriesPlotMethods' object has no attribute 'scatter'`. — , Dec 29 '17 at 15:12

score 1 · Accepted Answer · answered Jan 09 '18 at 13:22

1

The answer as given by @COLDSPEED.

v = df.groupby('TIME')['BYTES'].sum(); 
a, b = v.index.tolist(), v.tolist()

answered Jan 09 '18 at 13:22

score 0 · Answer 2 · answered Dec 29 '17 at 13:35

0

time=ax[:,0]
bytes=ax[:,1]

Can you try that?

If that doesn't work, then this should

time=ax["TIME"]
bytes=ax["BYTES"]

answered Dec 29 '17 at 13:35

Srihari

With `time = ax[:,0]`, I got the following error `Can only tuple-index with a MultiIndex`. With `time=ax["TIME"]`, I got this error: `Can only tuple-index with a MultiIndex`. – Dec 29 '17 at 13:41

2 Answers2