Fastest way to label rows where one column is between the values of several other columns

Question

I have a pandas.DataFrame like this:

The first column is the theoretical price I would pay for something because that's what I think it is worth to me. The other columns are the average price that other people are trying to buy it at (global_buy_price) and the average price that other people are trying to sell at (global_sell_price) and the middle of the two, the global_middle_price.

What is the fastest way to create the column called I bought which will contain a 1 if I bought and a 0 if I didn't buy?

The way I know if I bought is that :

If my_buy_price is > than the global_sell_price, then there's a 100% chance I bought.
If my_buy_price is between global_middle_price and global_sell_price, then 80% chance that I bought
If my_buy_price is between global_buy_price and global_middle_price, then 20% chance that I bought
If my_buy_price is below global_buy_price, then there's a 0% chance that I bought

The way I currently do this is by writing an if-elif-elif-elif function and then using pandas apply to create the I bought column, which is very slow becuase I have 1,000,000 rows. Is there some much faster/more correct way of doing this?

This is the code that generates the same pandas.DataFrame:

import pandas
import numpy

numpy.random.seed(1)

cols = ['my_buy_price', 'global_buy_price', 'global_middle_price', 'global_sell_price']

df   = pandas.DataFrame(index = range(10), columns=cols) 

df['global_buy_price']    = 400
df['global_sell_price']   = 500
df['global_middle_price'] = df[['global_buy_price', 'global_sell_price']].mean(axis=1)

df['my_buy_price']  = [numpy.random.randint(350, 550) for x in range(len(df))]

df['I bought'] = '?'

print(df)

How about some sample data, that does not require us to type stuff in from a picture? In these cases, pictures are almost worthless, please don't. — Stephen Rauch, Feb 01 '18 at 04:59
https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — Stephen Rauch, Feb 01 '18 at 05:01
How do you want to display the values which have chances for eg. 20%,80% etc. because you mentioned that the column will contain either 1 or 0. — DeadCoderz, Feb 01 '18 at 05:39
@StephenRauch I provided code to regenerate the picture in my original question. @DeadCoderz If the crossing happens between `global_buy_price` and `global_middle_price` then it should put 1 into the `I bought` column 20% of the time. — user1367204, Feb 01 '18 at 12:54

score 1 · Accepted Answer · answered Feb 01 '18 at 05:17

Since you mention between , I will create a new df and using idxmax

N=(df['my_buy_price']>df['global_sell_price']).to_frame('100%')
N['80%']=df['my_buy_price'].between(df['global_middle_price'],df['global_sell_price'])
N['20%']=df['my_buy_price'].between(df['global_buy_price'],df['global_middle_price'])
N['0%']=df['my_buy_price']<df['global_buy_price']
N.idxmax(1)
Out[279]: 
0      0%
1     80%
2     20%
3     80%
4     80%
5     20%
6    100%
7     80%
8     80%
9     20%
dtype: object

Fastest way to label rows where one column is between the values of several other columns

1 Answers1