-1

I want to calculate frequency of a word in a sentence. My dataframe has a "Title" column which contains a sentence (String) in each row. This is my current approach:

# num times queryWord is in sentence / num words in sentence
list = df['Title'].str.count(queryWord) / len(df['Title'].str.split())

However, len(df['Title'].str.split()) returns the length of the "Title" column rather than the length of the array that is generated by split() in each row. How do I fix this?

Luciano
  • 426
  • 2
  • 9
  • 19

1 Answers1

1

This should do the trick:

list = df['Title'].str.count(queryWord) / df['Title'].str.split().str.len()

df['Title'].str.split() returns a pd.Series of list objects. That's why this question was marked as a duplicate.

tobsecret
  • 2,442
  • 15
  • 26
  • Thanks, that did it. What is the meaning of .str? – Luciano Jun 26 '18 at 19:33
  • Glad it worked, please accept the answer. In pandas, the `Series` object has string methods which you can access via `.str.method_name`. The reason you have to access them that way is that some of them have the same name as another method that has a different use. Examples of this are `pd.Series.str.replace` which does not work the same way as `pd.Series.replace` and `pd.Series.str.get` which does not work the same way as `pd.Series.get` – tobsecret Jun 26 '18 at 19:41