how to solve for 1000s of variables in python with pandas package

Question

I am importing about 50000 books, along with their prices, in a CSV file. I have uploaded them into pandas with pd.read_excel and filtered the data. The issue that I am having is I am trying to figure out how to solve for the variable x with each book price. To be more specific, if the book has a suggested retail price of $18.99 I then multiply that by the discount that I am receiving by the publisher.

   val = [samplefilinv['retail'] * (1- samplefilinv['discount_level']) 
   0    11.3940
   1     9.5936
   2     5.1136
   3    10.1940
   4     7.6736
   7    10.7940

So, in the example, above samplefilinv is the dataset containing the retail price of each book along with the discount level. So, if a book has a suggested retail price of $18.99 with a 40% discount it comes out to $11.39 (that is the first row). I then take the fees associated with selling on Amazon, which is 15% + $1.50 for each book and try to calculate the break-even price that I would have to price it at. So, .85x -15.79-1.80 = 0 would be the algebraic equation that I am trying to solve for the first row to come to the break-even price (which is $20.69). This is going to go on for 50000 books. I am trying to figure out how to do this. I have tried putting the dataset into a numpy array, and then from there putting it into a matrix where I could at least try to solve for zero. This hasn't worked and I am not sure how I would convert the information back into a pandas format that I could use.

np.arange('val').reshape((2,3))

In the above "val" is the retail price* discount. I have received the following error message:

TypeError: unsupported operand type(s) for -: 'str' and 'int'

Welcome to StackOverflow. Please take the [tour](https://stackoverflow.com/tour) and learn [How to Ask](https://stackoverflow.com/help/how-to-ask). In order to get help, you will need to provide a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). If your question include a pandas dataframe, please provide a [reproducible pandas example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — alec_djinn, Sep 28 '22 at 12:37
One of your column contains string instead of numbers. Check the types of your columns with dtypes https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dtypes.html then assign the correct types. https://stackoverflow.com/questions/50642777/set-data-type-for-specific-column-when-using-read-csv-from-pandas — alec_djinn, Sep 28 '22 at 12:39

score 0 · Answer 1 · answered Sep 28 '22 at 12:58

I am going to guess you have your datframe with strings such that your df looks like this:

retailprice|discount_level
$18.99     |40%
$10.00     |10%

If we assume the format is like this. THen you have to do an additional step before letting the math take over. You would need to parse the 'strings' to floats before you can perform math. A nice way would be to use a lambda function

parse_price = lambda x: float(x[1:]) #dollar sign in front so just skip the first character
parse_disc = lambda x: float(x[:-1])/100 #percent sign at the back so just skip the last character

Of course there are other ways to parse but this should give you an idea of what you need to do.

THen you can simply use what pandas dataframes are good for.

samplefilinv['Breakeven'] = parse_price(samplefilinv['retail'])*(1-parse_disc(samplefilinv['discount_level'])/0.85 + 1.5 
#or whatever your equation is. I assume Breakeven = (retail*(1-discount))/0.85 + 1.5. Didnt really understand your math.

how to solve for 1000s of variables in python with pandas package

1 Answers1