pandas.Series.map
works a little different than Python's map.
Assume you have a small dictionary that holds the roots of some commonly used words:
roots_dict = {"going": "go", "went": "go", "took": "take", "does": "do",
"thought": "think", "came": "come", "begins": "begin"}
You also have a pandas DataFrame and in that dataframe you have a column of words:
df = pd.DataFrame({"word": ["took", "gone", "done", "begins", "came",
"thought", "took", "went"]})
word
0 took
1 gone
2 done
3 begins
4 came
5 thought
6 took
7 went
If you want an additional column that shows the roots of these word, you can use map. For each element in that series (column), map checks the dictionary if that word exists as a key in the dictionary. If it does, it returns the value; otherwise it returns NaN
:
df["root"] = df["word"].map(roots_dict)
word root
0 took take
1 gone NaN
2 done NaN
3 begins begin
4 came come
5 thought think
6 took take
7 went go
Instead of a dictionary, you can pass a series too. In that case, it checks the index of the series.
In your example, it works with a function. That function is designed to take a string (possibly containing several words), convert it to all lowercase, split it into words and apply NLTK's Snawball Stemmer to each word. So, with df_all['search_term'].map(lambda x: str_stemmer(x))
each row in your "search_term" column (x being the string in that row) is an input to str_stemmer()
. .map
combines the elements returns by that function and returns you another series where you have the roots for all words.