-3

I Need help in understanding the following code.

df_all['search_term'] = df_all['search_term'].map(lambda x:str_stemmer(x))

Link of the complete code : https://www.kaggle.com/wenxuanchen/home-depot-product-search-relevance/sklearn-random-forest/code
Thank you.

dcrosta
  • 26,009
  • 8
  • 71
  • 83
sbk23
  • 13
  • 1
  • 8
  • Well it's hard to say without seeing the definition of `str_stemmer`? but basically it calls `str_stemmer` on each element of your column `'search_term'` – EdChum May 05 '16 at 12:22
  • http://stackoverflow.com/questions/10973766/understanding-the-map-function – Dmitry Yudin May 05 '16 at 12:24
  • 1
    The [docs](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.map.html) give an example but it's unclear what your lack of understanding is – EdChum May 05 '16 at 12:32

2 Answers2

2

I looked at the other questions and they don't really seem to explain your question - what does the map function do?

map takes an iterable, and a function, and applies the function to every element in the iterable in turn.

Here's an example:

def square_the_things(value):
    print('Squaring {}'.format(value))
    return value * value


items = [1,2,3,4,5]
squared_items = map(square_the_things, items)

for squared in squared_items:
    print('Squared item is: {}'.format(squared))

Output

Squaring 1
Squared item is: 1
Squaring 2
Squared item is: 4
Squaring 3
Squared item is: 9
Squaring 4
Squared item is: 16
Squaring 5
Squared item is: 25

Note that we're passing the name of the function, without () at the end, to map. A lambda is just a function with no name. In your case, you could actually just pass in .map(str_stemmer), since it just takes one argument.

Walking through my example, you can see that the first output comes from within the function - Squaring 1. Then it goes through the first iteration of the loop and displays Squared item is: 1. That is because I'm using Python3 and map is an iterator. In Python2, it outputs something different:

Squaring 1
Squaring 2
Squaring 3
Squaring 4
Squaring 5
Squared item is: 1
Squared item is: 4
Squared item is: 9
Squared item is: 16
Squared item is: 25

That's because it applies the function over the iterable first and produces a list.

Wayne Werner
  • 49,299
  • 29
  • 200
  • 290
2

pandas.Series.map works a little different than Python's map.

Assume you have a small dictionary that holds the roots of some commonly used words:

roots_dict = {"going": "go", "went": "go", "took": "take", "does": "do", 
              "thought": "think", "came": "come", "begins": "begin"}

You also have a pandas DataFrame and in that dataframe you have a column of words:

df = pd.DataFrame({"word": ["took", "gone", "done", "begins", "came",
                            "thought", "took", "went"]})

      word
0     took
1     gone
2     done
3   begins
4     came
5  thought
6     took
7     went

If you want an additional column that shows the roots of these word, you can use map. For each element in that series (column), map checks the dictionary if that word exists as a key in the dictionary. If it does, it returns the value; otherwise it returns NaN:

df["root"] = df["word"].map(roots_dict)

      word   root
0     took   take
1     gone    NaN
2     done    NaN
3   begins  begin
4     came   come
5  thought  think
6     took   take
7     went     go

Instead of a dictionary, you can pass a series too. In that case, it checks the index of the series.

In your example, it works with a function. That function is designed to take a string (possibly containing several words), convert it to all lowercase, split it into words and apply NLTK's Snawball Stemmer to each word. So, with df_all['search_term'].map(lambda x: str_stemmer(x)) each row in your "search_term" column (x being the string in that row) is an input to str_stemmer(). .map combines the elements returns by that function and returns you another series where you have the roots for all words.

ayhan
  • 70,170
  • 20
  • 182
  • 203
  • Thank you ayhan. I also followed up from a [link](http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.Series.apply.html). 'lambda' is an anonymous function that creates the dictionary of a word related to its stemmed version. and then this dictionary is passed as parameter to the map function. @ayhan – sbk23 May 06 '16 at 04:34