I have a data frame, that looks like this:
print(df)
Text
0|This is a text
1|This is also text
What I wish: I would like to do a for loop over the Text column for the data frame, and create a new column with the derived information to be like this:
Text | Derived_text
0|This is a text | Something
1|This is also text| Something
Code: I have written the following code (Im using Spacy btw):
for i in df['Text'].tolist():
doc = nlp(i)
resolved = [(doc._.coref_resolved) for docs in doc.ents]
df = df.append(pd.Series(resolved), ignore_index=True)
Problem: The problem is that the appended series gets misplaced/mismatched, so it looks like this:
Text | Derived_text
0|This is a text | NaN
1|This is also text| NaN
2|NaN | Something
3|NaN | Something
I have also tried to just save it into a list, but the list does not include NaN values, which can occur doing the derived for loop. I need the NaN values to be kept, so I can match the original text with the derived text using the index position.