0

I have searched for my exact issue to no avail. These two threads Creating a new column based on if-elif-else condition and create new pandas dataframe column based on if-else condition with a lookup guided my code though my code fails to execute.

Issues: I have a dataframe which I have example-reproduced below. Region attribute has two values only - a or b (or could have more), same for year, though region a could have both years etc. What I want to do is create a new column, "dollars", and look up the value for region, and if it is region "a" AND year is e.g 2006, takes sales in that row, and multiply with rate for that year and append value in new column - dollars. I am beginner and below is what I have so far - via function - and obviously executing the .apply function returns a ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0'). I am particularly interested in a more efficient implementation as the dataframe is rather large and would love to optimize computational efficiency.

import pandas as np

rate_2006, rate_2007 = 100, 200


c = {
'region': ["a", "a", "a", "a", "a", "b", "b", "b", "b", "a", "b"],
'year': [2006, 2007, 2007, 2006, 2006, 2006, 2007, 2007, 2007, 2006, 2007],
'sales': [500, 100, 2990, 15, 5000, 2000, 150, 300, 250, 1005, 600]
}

df1 = pd.DataFrame(c)
df1

def new_col(row): 
    if df1["region"] == "a" and df1["year"] == 2006:
        nc = row["sales"] * rate_2006
    elif df1["region"] == "a" and df1["year"] == 2007:
        nc = row["sales"] * rate_2007
    elif df1["region"] == "b" and df1["year"] == 2006:
        nc = row["sales"] * rate_2006
    else:
        nc = row["sales"] * rate_2007
    return nc

df1["Dollars"] = df1.apply(new_col, axis=1)
df1
wrangler
  • 3
  • 3

1 Answers1

0

The issue maybe due to the way you are using it. I don't know if it will help you. but I have re written the code as per my knowledge that is working.

import pandas as pd

rate_2006, rate_2007 = 100, 200


c = {
'region': ["a", "a", "a", "a", "a", "b", "b", "b", "b", "a", "b"],
'year': [2006, 2007, 2007, 2006, 2006, 2006, 2007, 2007, 2007, 2006, 2007],
'sales': [500, 100, 2990, 15, 5000, 2000, 150, 300, 250, 1005, 600]
}

df1 = pd.DataFrame(c)
print(df1)

def new_col(value): 
    if df1.loc[value,"region"] == "a" and df1.loc[value,"year"] == 2006:
        df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2006
    elif df1.loc[value,"region"] == "a" and df1.loc[value,"year"] == 2007:
        df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2007
    elif df1.loc[value,"region"] == "b" and df1.loc[value,"year"] == 2006:
        df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2006
    else:
        df1.loc[value,"Dollars"] = df1.loc[value,"sales"] * rate_2007

for value in range(len(df1)):
    new_col(value)
Sumit S Chawla
  • 3,180
  • 1
  • 14
  • 33
  • your implementation works on the example data set and provides what I want. I am still running it on my huge data and not sure when it will complete :-). Like I mentioned I know there is probably an efficient approach than my method, so will keep looking. – wrangler Nov 14 '17 at 19:37
  • There are definitely other efficient methods available. But you can at least start working with this code and can replace this part when you find a better approach. – Sumit S Chawla Nov 15 '17 at 04:53