I downloaded the CSV file cleaned_depression_vs_suicide.csv
from Kaggle to practice classification using different models. All my code works fine and I successfully ignored the null case for title and body of the reddits by using notnull() methods: (df1 is the CSV file)
df1 = df1[df1["Body"].notnull()]
df1 = df1[df1["Title"].notnull()]
However, it will shows KeyError on certain rows when the following line appears:
for i in range(0, nRow - 1):
result1 = preprocessing(df1["Title"][i])
result2 = preprocessing(df1["Body"][i])
df2.loc[i, 'cleaned_text'] = result1 + " " + result2
My preprocessing methods:
Given a string s, pre-process s and return updated features and bag-of-words representation which includes:
- split the text
- changes to lower-case
- remove punctuation
- tokenize s using nlp (en_core_web_sm) to create a doc
- update features with lemmas encountered in s
it told me where the key error is and I checked it:
the normal one should look like this:
There were a few that look exactly like this before I do anything and I changed the dataset manually to modify (as long as I changed it, the row from 0 to that line will function normally and display model accuracy, etc), what would be the reason that this situation happens and how to fix this problem without reading through the CSV file one by one?