1

I have a data frame called df, and would like to add a column "Return" based on the existing columns by using lambda function. For each row, if value of "Field_3" < 50, then "Return" value would be the value of "Field_1", otherwise it would be "Field_2" value. My code raised a value error: Wrong number of items passed 7, placement implies 1. I'm a Python beginner, any help would be appreciated.

values_list = [[15, 2.5, 100], [20, 4.5, 50], [25, 5.2, 80],
           [45, 5.8, 48], [40, 6.3, 70], [41, 6.4, 90],
           [51, 2.3, 111]]


df = pd.DataFrame(values_list, columns=['Field_1', 'Field_2', 'Field_3'])

df["Return"] = df["Field_3"].apply(lambda x: df['Field_1'] if x < 50 else df['Field_2'])
M D
  • 77
  • 1
  • 6

1 Answers1

0

The syntax here is a little tricky. You want:

def col_sorter(x, y, z):
    if z < 50:
        return x
    else:
        return y

df['return'] = df[['Field_1', 'Field_2', 'Field_3']].apply(lambda x: col_sorter(*x), axis=1)

out:
Field_1  Field_2  Field_3  return
0       15      2.5      100     2.5
1       20      4.5       50     4.5
2       25      5.2       80     5.2
3       45      5.8       48    45.0
4       40      6.3       70     6.3

Here's what's going on:

  1. Define a function col_sorter that takes in the variables from one row of the dataframe and does what you want with it. (This isn't strictly necessary, but it's a good habit to form since not every transformation you want to do will be as simple as this, and this syntax scales.)
  2. Then call apply off the columns you want from the dataframe, and pass them as a tuple to be unpacked into your function as a lambda function. (That's what the *x is doing).

This pattern will let you create a new column from arbitrarily complex calculations based on other columns in the dataframe, and while not as fast as vectorization is pretty fast.

For a lot more detail and discussion, see here.

Welcome to python and StackOverflow!

BLimitless
  • 2,060
  • 5
  • 17
  • 32
  • Thank you so much, BLimitless. Great to learn lambda function further. – M D Jun 14 '21 at 00:04
  • When you call `apply` here, you're calling it off ['Field_1', 'Field_2', 'Field_3']. The `apply` takes one row containing entries from each of those columns and passes it into the function as a `series`. The `*x` catches that `series` and lets the function unpack it into the individual variables the function expects. – BLimitless Jun 14 '21 at 00:08
  • Fantastic. I didn't know lambda can be used in this way. Thanks a lot. – M D Jun 14 '21 at 00:17