1

I have a feeling this is very simple but I am having a major issue with this.

Say I have the following dataframe in pandas.

     price  ordersize
0  0.139664   6.051679
1  0.139665   2.358634
2  0.139665   2.618828
3  0.139665  27.240000
4  0.139665   0.040661
5  0.140060   3.000000
6  0.140100   1.463016
7  0.140128   0.020000
8  0.140418  85.000000
9  0.140427   7.000000

This is an orderbook for BCHBTC

As you can see starting at index 1 to index 5 we see a number of orders at the same price.

I need to take this input and get it to bin the data so it outputs another dataframe like this.

     price  ordersize
0  0.139664   6.051679
1  0.139665   32.258123
2  0.140060   3.000000
3  0.140100   1.463016
4  0.140128   0.020000
5  0.140418  85.000000
6  0.140427   7.000000

I have tried using groupby and other things but It is not giving me the correct output, or gives it in a very weird formating that is hard to work with.

If I could get some help with this that would be much appreciated.

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
xxen0nxx
  • 87
  • 1
  • 5

3 Answers3

0

You could use groupby and sum:

df2 = df.groupby("price").sum()

This gets you the new ordersize, with the price as index. If you want the index back as a column, you can use reset_index:

df2.reset_index(level=0, inplace=True)
Graipher
  • 6,891
  • 27
  • 47
0

Use, groupby with as_index=False:

df = df.groupby('price', as_index=False)['ordersize'].sum()
print(df)

Output:

      price  ordersize
0  0.139664   6.051679
1  0.139665  32.258123
2  0.140060   3.000000
3  0.140100   1.463016
4  0.140128   0.020000
5  0.140418  85.000000
6  0.140427   7.000000
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • Did not work, just gave me a dataframe with the same as the input. – xxen0nxx Feb 01 '18 at 21:55
  • @xxen0nxx In this example, you have three input values for price 0.1396665, the output has one value for that price. from 9 records to 6 records. Do you need to create price ranges? – Scott Boston Feb 01 '18 at 21:57
  • When I print the dataframe though it gives me the exact same as the dataframe I fed into it, it's like the group by is just not working. It is very confusing. – xxen0nxx Feb 01 '18 at 21:59
  • 1
    @xxen0nxx You are not setting it back. try df = df.groupby..... the print(df). groupby is not an inplace operation, you need to assign it back to a variable. – Scott Boston Feb 01 '18 at 22:00
  • here is the code i have so far `while True: top_of_book = bchbtc_asks.head(10) print (type(top_of_book)) #top_of_book = top_of_book.groupby.size() price_mean = bchbtc_asks["price"].mean() df2 = top_of_book.groupby('price', as_index=False)['ordersize'].sum() print (type(df2)) time.sleep(5) ` – xxen0nxx Feb 01 '18 at 22:04
  • Okay, you are getting types back right now. Did you want to look at the data? – Scott Boston Feb 01 '18 at 22:11
  • Yes i did just print(df2) and it gave me same dataframe as the input dataframe. – xxen0nxx Feb 01 '18 at 22:16
0

Figured it out.

I was using the wrong datatype and it was rounding the float numbers to the wrong decimal place. Turns out I may not even need to use group by anymore.

xxen0nxx
  • 87
  • 1
  • 5