Vector function on Pandas Dataframe

Question

I want to calculate frequency of a word in a sentence. My dataframe has a "Title" column which contains a sentence (String) in each row. This is my current approach:

# num times queryWord is in sentence / num words in sentence
list = df['Title'].str.count(queryWord) / len(df['Title'].str.split())

However, len(df['Title'].str.split()) returns the length of the "Title" column rather than the length of the array that is generated by split() in each row. How do I fix this?

score 1 · Accepted Answer · answered Jun 26 '18 at 19:29

1

This should do the trick:

list = df['Title'].str.count(queryWord) / df['Title'].str.split().str.len()

df['Title'].str.split() returns a pd.Series of list objects. That's why this question was marked as a duplicate.

answered Jun 26 '18 at 19:29

tobsecret

2,442
15
26

Thanks, that did it. What is the meaning of .str? – Luciano Jun 26 '18 at 19:33
Glad it worked, please accept the answer. In pandas, the `Series` object has string methods which you can access via `.str.method_name`. The reason you have to access them that way is that some of them have the same name as another method that has a different use. Examples of this are `pd.Series.str.replace` which does not work the same way as `pd.Series.replace` and `pd.Series.str.get` which does not work the same way as `pd.Series.get` – tobsecret Jun 26 '18 at 19:41

Vector function on Pandas Dataframe

1 Answers1