Pandas Bining and Group By

Question

I have a feeling this is very simple but I am having a major issue with this.

Say I have the following dataframe in pandas.

     price  ordersize
0  0.139664   6.051679
1  0.139665   2.358634
2  0.139665   2.618828
3  0.139665  27.240000
4  0.139665   0.040661
5  0.140060   3.000000
6  0.140100   1.463016
7  0.140128   0.020000
8  0.140418  85.000000
9  0.140427   7.000000

This is an orderbook for BCHBTC

As you can see starting at index 1 to index 5 we see a number of orders at the same price.

I need to take this input and get it to bin the data so it outputs another dataframe like this.

     price  ordersize
0  0.139664   6.051679
1  0.139665   32.258123
2  0.140060   3.000000
3  0.140100   1.463016
4  0.140128   0.020000
5  0.140418  85.000000
6  0.140427   7.000000

I have tried using groupby and other things but It is not giving me the correct output, or gives it in a very weird formating that is hard to work with.

If I could get some help with this that would be much appreciated.

What are the types of `price`, `ordersize` columns? They are `str` and `float`? — mr.tarsa, Feb 01 '18 at 21:24

Graipher · Answer 1 · 2018-02-01T21:43:55.973

0

You could use groupby and sum:

df2 = df.groupby("price").sum()

This gets you the new ordersize, with the price as index. If you want the index back as a column, you can use reset_index:

df2.reset_index(level=0, inplace=True)

edited Feb 01 '18 at 21:43

answered Feb 01 '18 at 21:27

Graipher

6,891
27
47

O_o It didn't do anything. – xxen0nxx Feb 01 '18 at 21:34
@xxen0nxx Then maybe your prices are not actually exactly the same, just up to the precision shown? – Graipher Feb 01 '18 at 21:35
Let me try this again one sec. – xxen0nxx Feb 01 '18 at 21:43
Very weird, It is just outputting the same as the last dataframe. The price precision is as shown it's just I am trying to get the cumulative amount of coins at a specific price level. – xxen0nxx Feb 01 '18 at 21:47

Scott Boston · Answer 2 · 2018-02-01T22:02:26.033

0

Use, groupby with as_index=False:

df = df.groupby('price', as_index=False)['ordersize'].sum()
print(df)

Output:

      price  ordersize
0  0.139664   6.051679
1  0.139665  32.258123
2  0.140060   3.000000
3  0.140100   1.463016
4  0.140128   0.020000
5  0.140418  85.000000
6  0.140427   7.000000

edited Feb 01 '18 at 22:02

answered Feb 01 '18 at 21:35

Scott Boston

147,308
15
139
187

Did not work, just gave me a dataframe with the same as the input. – xxen0nxx Feb 01 '18 at 21:55
@xxen0nxx In this example, you have three input values for price 0.1396665, the output has one value for that price. from 9 records to 6 records. Do you need to create price ranges? – Scott Boston Feb 01 '18 at 21:57
When I print the dataframe though it gives me the exact same as the dataframe I fed into it, it's like the group by is just not working. It is very confusing. – xxen0nxx Feb 01 '18 at 21:59
1

@xxen0nxx You are not setting it back. try df = df.groupby..... the print(df). groupby is not an inplace operation, you need to assign it back to a variable. – Scott Boston Feb 01 '18 at 22:00
here is the code i have so far `while True: top_of_book = bchbtc_asks.head(10) print (type(top_of_book)) #top_of_book = top_of_book.groupby.size() price_mean = bchbtc_asks["price"].mean() df2 = top_of_book.groupby('price', as_index=False)['ordersize'].sum() print (type(df2)) time.sleep(5) ` – xxen0nxx Feb 01 '18 at 22:04
Okay, you are getting types back right now. Did you want to look at the data? – Scott Boston Feb 01 '18 at 22:11
Yes i did just print(df2) and it gave me same dataframe as the input dataframe. – xxen0nxx Feb 01 '18 at 22:16

score 0 · Answer 3 · answered Feb 01 '18 at 22:57

0

Figured it out.

I was using the wrong datatype and it was rounding the float numbers to the wrong decimal place. Turns out I may not even need to use group by anymore.

answered Feb 01 '18 at 22:57

xxen0nxx

87
1
5

Pandas Bining and Group By

3 Answers3