0

I encountered this lambda expression today and can't understand how it's used:

data["class_size"]["DBN"] = data["class_size"].apply(lambda x: "{0:02d}{1}".format(x["CSD"], x["SCHOOL CODE"]), axis=1)

The line of code doesn't seem to call the lambda function or pass any arguments into it so I'm confused how it does anything at all. The purpose of this is to take two columns CSD and SCHOOL CODE and combine the entries in each row into a new row, DBN. So does this lambda expression ever get used?

M. Rios
  • 11
  • 3
  • Looking at pandas documentation, `apply` applies a function passed in as an argument to something, this `lambda` acts as that function which will be passed in as an argument, which will then be used by the `apply` function – Professor_Joykill Aug 09 '17 at 14:41
  • 4
    Posting an example DataFrame would be helpful. Then narrow down what you are confused about with respect to that DataFrame. – Alex Aug 09 '17 at 14:51
  • Why is it that you use `data["class_size"]["DBN"]` instead of `data["DBN"]` – A.Kot Aug 09 '17 at 15:01
  • Yes. `apply` can accept a lambda expression: http://pandas.pydata.org/pandas-docs/stable/10min.html#apply – Paul H Aug 09 '17 at 15:31
  • Apply can accept a lambda expression but you've used apply to a column rather than an entire dataframe. See answer below. – A.Kot Aug 09 '17 at 16:29

4 Answers4

2

You're writing your results incorrectly to a column. data["class_size"]["DBN"] is not the correct way to select the column to write to. You've also selected a column to use apply with but you'd want that across the entire dataframe.

data["DBN"] = data.apply(lambda x: "{0:02d}{1}".format(x["CSD"], x["SCHOOL CODE"]), axis=1)
A.Kot
  • 7,615
  • 2
  • 22
  • 24
  • there are two issues with this approach: 1) `Series.apply()` doesn't have `axis` parameter. 2) when we do `data["class_size"].apply(lambda x: ...)` we can't access other columns via `x`. Solution: `data.apply(lambda x: "{0:02d}{}".format(x["CSD"], x["SCHOOL CODE"]), axis=1)`. But there might be better __vectorized__ solutions – MaxU - stand with Ukraine Aug 09 '17 at 15:29
  • [Docs for `Series.apply()` ...](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.apply.html) – MaxU - stand with Ukraine Aug 09 '17 at 15:45
  • 1
    @MaxU You're right. I've applied the changes to my code. I assumed he was applying to a dataframe. – A.Kot Aug 09 '17 at 16:29
1

the apply method of a pandas Series takes a function as one of its arguments.

here is a quick example of it in action:

import pandas as pd

data = {"numbers":range(30)}

def cube(x):
    return x**3

df = pd.DataFrame(data)

df['squares'] = df['numbers'].apply(lambda x: x**2)

df['cubes'] = df['numbers'].apply(cube)

print df

gives:

   numbers  squares  cubes
0        0        0      0
1        1        1      1
2        2        4      8
3        3        9     27
4        4       16     64
...

as you can see, either defining a function (like cube) or using a lambda function works perfectly well.

As has already been pointed out, if you're having problems with your particular piece of code it's that you have data["class_size"]["DBN"] = ... which is incorrect. I was assuming that was an odd typo because you didn't mention getting a key error, which is what that would result in.


if you're confused about this, consider:

def list_apply(func, mylist):
    newlist = []
    for item in mylist:
        newlist.append(func(item))

this is a (not very efficient) function for applying a function to every item in a list. if you used it with cube as before:

a_list = range(10)

print list_apply(cube, a_list)

you get:

[0, 1, 8, 27, 64, 125, 216, 343, 512, 729]

this is a simplistic example of how the apply function in pandas is implemented. I hope that helps?

Stael
  • 2,619
  • 15
  • 19
  • I hadn't tried running this code as I found it while reading another project and I was only trying to understand the thought process and methods behind it. This helped a lot though in understanding how the apply function works and the lambda. The way I think of the dataframe is like a 2D array I suppose, so ["class_size"] was one of my local data sets and I wanted to add a column ["DBN"]. Is that not the right way to think of it? – M. Rios Aug 09 '17 at 20:23
  • the data frame is a 2d array, that seems perfectly fine - i'm not sure what you mean by 'one of my datasets' but if you are adding a column you want it to be `dataset[column_name] = x` where `x` is a `Pandas.Series` (which you typically get from single column operations on a dataframe... I'm not being helpful am I?). – Stael Aug 10 '17 at 08:39
  • I guess the way to think about it might be that it works like `new_variable = operation(old_variable)` anything on the left is something to be created/replaced, anything that currently exists should be on the right of the `=`. – Stael Aug 10 '17 at 08:41
1

Are you using a multi-index dataframe (i.e. There are column hierarchies)? It's hard to tell without seeing your data, but I'm presuming it is the case, since just using data["class_size"].apply() would yield a series on a normal dataframe (meaning the lambda wouldn't be able to find your columns specified and then there would be an error!)

I actually found this answer which explains the problem of trying to create columns in multi-index dataframes, one confusing things with multi-index column creation is that you can try to create a column like you are doing and it will seem to run without any issues, but won't actually create what you want. Instead, you need to change data["class_size"]["DBN"] = ... to data["class_size", "DBN"] = ... So, in full:

data["class_size","DBN"] = data["class_size"].apply(lambda x: "{0:02d}{1}".format(x["CSD"], x["SCHOOL CODE"]), axis=1)

Of course, if it isn't a mult-index dataframe then this won't help, and you should look towards one of the other answers.

Clusks
  • 500
  • 5
  • 15
0

I think 0:02d means 2 decimal place for "CSD" value. {}{} basically places the 2 values together to form 'DBN'.

Carine Ng
  • 1
  • 1