0

I have a dataframe that I want to apply a function that take one value and will give two values as a result. I used .apply(get_data).transpose().values to put the results to the dataframe. It worked when I had only two rows in the dataframe but didn't worked with more than two rows. I got the "Too many values to unpack (expected 2)" error.

 oil_df = pd.DataFrame({
    "Oils":["Oil 1","Oil 2","Oil 3"], 
    "Price":["","",""], 
    "Unit":["","",""]})
def get_data(oil):
    if oil == "Oil 1":
        price = 20
        unit = 50
    if oil == "Oil 2":
        price = 30
        unit = 75
    if oil == "Oil 3":
        price = 40
        unit = 100
    return(price, unit)
oil_df["Price"], oil_df["Unit"] = oil_df["Oils"].apply(get_data).transpose().values

At first, I couldn't find a way to apply the function, so I divided the function to two pieces and applied them one by one, but it took so much longer as expected. I found this way with the help of this answer. axis=1, result_type='expand is giving me "get_data() got an unexpected keyword argument 'axis'" error, so removed that part. I'm open to any suggestions to make this work or another way to apply this function to the dataframe. Thank you!

2 Answers2

0

Try loc or np.select

for column, values in [('Price', [20, 30, 40]), ('Unit', [50, 75, 100])]:
    # loc
    # oil_df.loc[oil_df['Oils'].isin(['Oil 1', 'Oil 2', 'Oil 3']), column] = values
    # np.select
    oil_df[column] = np.select(condlist=[oil_df['Oils'] == f'Oil {i}' for i in range(1, 4)], choicelist=values)

output

    Oils Price Unit
0  Oil 1    20   50
1  Oil 2    30   75
2  Oil 3    40  100
Shuo
  • 1,512
  • 1
  • 3
  • 13
  • Thank you for your answer. I'm new so I might have misunderstood it. This way would be hard to use in my case. The datas in the function were examples. The function originally scraps the data from a website. Also the dataframe will change quite often. – Tarık Sülüç Jun 08 '23 at 03:06
0

Not sure what you are really trying to achieve with this, but you are transposing the data, so the array you return has 3 tuples (for oil1, 2, and 3) so it has too many values to unpack.

If your intent is to fill in the second and third column, then you want to return a dataframe with 2 columns (i.e, a Series per row), and then use it as input to the dataframe's 2 columns

import pandas as pd

oil_df = pd.DataFrame({
    "Oils":["Oil 1","Oil 2","Oil 3"], 
    "Price":["","",""], 
    "Unit":["","",""]})

def get_data(oil):
    if oil == "Oil 1":
        price = 20
        unit = 50
    if oil == "Oil 2":
        price = 30
        unit = 75
    if oil == "Oil 3":
        price = 40
        unit = 100
    return pd.Series([price, unit])

oil_df[['Price', 'Unit']] = oil_df["Oils"].apply(get_data)

oil_df

returns

    Oils    Price   Unit
0   Oil 1   20  50
1   Oil 2   30  75
2   Oil 3   40  100

in this case

oil_df["Oils"].apply(get_data)

returns

    0   1
0   20  50
1   30  75
2   40  100

and assigning this to oil_df[['Price', 'Unit']] merges the data

MrE
  • 19,584
  • 12
  • 87
  • 105
  • I don't know why I couldn't think about returning the data as a dataframe. I guess I was too busy with by scratching my head. Thank you! – Tarık Sülüç Jun 08 '23 at 03:08