0

I have a table with a column of text data. I want to get the frequency counts for each word so I have this code:

cm9_list = (df.cm9.str.split(expand=True).stack().value_counts()).reset_index()

which produces a dataframe like object. It says object type when I use dtypes. I change the column headers:

cm9_list.columns.values[0] = 'word'
cm9_list.columns.values[1] = 'frequency'

and then I want to remove the record in the table in the word column that has the 'nan' value (I do some text processing before this to strip punctuation and stop words etc. so I think these 'nan' values were inserted in null cells during that process.)

I am getting an error when I try to run this code:

cm9_list = cm9_list[cm9_list.columns[0] != 'nan']

That says:

KeyError: True

And I have also tried:

cm9_list = cm9_list[cm9_list['word'] != 'nan']

and get this:

KeyError: 'word'

I have no idea what these errors mean. All I can think of is that it doesn't recognize word as a column name. When I check the column names though, it looks normal:

Index(['word', 'frequency'], dtype='object')

What could be the issue? TIA!!

pav
  • 59
  • 1
  • 12

1 Answers1

1

You are putting an expression (cm9_list['word'] != 'nan') that is evaluated as True, and True isn't a key into cm9_list dictionary.

Like the last answer cm9_list dictionary hasn't got a key named "word".

Winter Squad
  • 169
  • 7
  • Thank you - I am not sure how I would write it instead. Can you please help with the syntax? Thank you again! – pav Dec 06 '21 at 18:24
  • Sorry for the update. I was using this thread for the syntax above: https://stackoverflow.com/questions/18172851/deleting-dataframe-row-in-pandas-based-on-column-value – pav Dec 06 '21 at 18:25