I have a dataframe that consists of two columns: ID and TEXT. Pretend data is below:
ID TEXT
265 The farmer plants grain. The fisher catches tuna.
456 The sky is blue.
434 The sun is bright.
921 I own a phone. I own a book.
I know all nltk functions do not work on dataframes. How could sent_tokenize be applied to the above dataframe?
When I try:
df.TEXT.apply(nltk.sent_tokenize)
The output is unchanged from the original dataframe. My desired output is:
TEXT
The farmer plants grain.
The fisher catches tuna.
The sky is blue.
The sun is bright.
I own a phone.
I own a book.
In addition, I would like to tie back this new (desired) dataframe to the original ID
numbers like this (following further text cleansing):
ID TEXT
265 'farmer', 'plants', 'grain'
265 'fisher', 'catches', 'tuna'
456 'sky', 'blue'
434 'sun', 'bright'
921 'I', 'own', 'phone'
921 'I', 'own', 'book'
This question is related to another of my questions here. Please let me know if I can provide anything to help clarify my question!