I have a pandas.DataFrame
like this:
The first column is the theoretical price I would pay for something because that's what I think it is worth to me. The other columns are the average price that other people are trying to buy it at (global_buy_price
) and the average price that other people are trying to sell at (global_sell_price)
and the middle of the two, the global_middle_price
.
What is the fastest way to create the column called I bought
which will contain a 1
if I bought and a 0
if I didn't buy?
The way I know if I bought is that :
If
my_buy_price
is > than theglobal_sell_price
, then there's a 100% chance I bought.If
my_buy_price
is betweenglobal_middle_price
andglobal_sell_price
, then 80% chance that I boughtIf
my_buy_price
is betweenglobal_buy_price
andglobal_middle_price
, then 20% chance that I boughtIf
my_buy_price
is belowglobal_buy_price
, then there's a 0% chance that I bought
The way I currently do this is by writing an if-elif-elif-elif function and then using pandas apply
to create the I bought
column, which is very slow becuase I have 1,000,000 rows. Is there some much faster/more correct way of doing this?
This is the code that generates the same pandas.DataFrame
:
import pandas
import numpy
numpy.random.seed(1)
cols = ['my_buy_price', 'global_buy_price', 'global_middle_price', 'global_sell_price']
df = pandas.DataFrame(index = range(10), columns=cols)
df['global_buy_price'] = 400
df['global_sell_price'] = 500
df['global_middle_price'] = df[['global_buy_price', 'global_sell_price']].mean(axis=1)
df['my_buy_price'] = [numpy.random.randint(350, 550) for x in range(len(df))]
df['I bought'] = '?'
print(df)