1

Can anyone help, i'm new to Python so bear with me.

My data looks like this but has all the region information available. I'm trying to create a new column 'actual price' that works out the price based on the region. as for every entry I have each price for every region. is this possible.

data = [[1, 'EDF', 'Eastern', 400, 500, 300], [2, 'EDF', 'Southern', 200, 100, 300], [3, 'NPower', 
        'Eastern', 600, 500, 700]] 


df = pd.DataFrame(data, columns = ['ID', 'Supplier', 'Region', 'Av Price', 'Eastern Price',  
'Southern Price']) 

df
em456
  • 359
  • 2
  • 11
  • 2
    What is your expected output for the sample data? – Quang Hoang Jun 01 '20 at 14:30
  • sorry that would have been helpful so ID Supplier Region Av Price Eastern Price Southern Price Price 1 EDF Eastern 400 500 300 500 2 EDF Southern 200 100 300 300 THe idea is to get rid of all the regional prices and just have the actual price if that makes sense. – em456 Jun 01 '20 at 14:33

3 Answers3

2

IIUC, you can do df.lookup here after adding " Price" to the values of the Region column to match the column names of the Price by region:

m = df.loc[:,df.columns.str.endswith("Price")]
df['actual_Price'] = m.lookup(df.index,df['Region'].add(" Price"))

print(df)
   ID Supplier    Region  Av Price  Eastern Price  Southern Price  \
0   1      EDF   Eastern       400            500             300   
1   2      EDF  Southern       200            100             300   
2   3   NPower   Eastern       600            500             700   

   actual_Price  
0           500  
1           300  
2           500  
anky
  • 74,114
  • 11
  • 41
  • 70
  • It appears this has worked but I have a warning SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. – em456 Jun 01 '20 at 15:28
  • @LMR is your dataframe a reference of any other dataframe or is it the first/source dataframe? – anky Jun 01 '20 at 15:31
  • it was created from taking columns from 2 other dataframes if that makes sense? – em456 Jun 01 '20 at 15:38
  • @LMR guess there is some issue there. use `.copy()` when referencing the other dataframes, check [this](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas) – anky Jun 01 '20 at 15:41
  • Thank you @anky that has solved the problem, you have probably jsut saved me a pile of time trying to work it out – em456 Jun 01 '20 at 15:50
1

I believe this is what you're looking for:

df["actual_price"] = np.where(df.Region == "Eastern", df["Eastern Price"], df["Southern Price"])

result:

enter image description here

Roy2012
  • 11,755
  • 2
  • 22
  • 35
1

Use, np.select:

conditions = [df['Region'].eq(reg) for reg in df['Region'].unique()]
choices = [df[f'{reg} Price'] for reg in df['Region'].unique()]
df['actual_price'] = np.select(conditions, choices)

Result:

# print(df)
   ID Supplier    Region  Av Price  Eastern Price  Southern Price  actual_price
0   1      EDF   Eastern       400            500             300           500
1   2      EDF  Southern       200            100             300           300
2   3   NPower   Eastern       600            500             700           500
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53