I want to create bag of words models but with calculate relative frequencies with nltk package. My data is built with pandas dataframe.
Here is my data:
text title authors label
0 On Saturday, September 17 at 8:30 pm EST, an e... Another Terrorist Attack in NYC…Why Are we STI... ['View All Posts', 'Leonora Cravotta'] Real
1 Story highlights "This, though, is certain: to... Hillary Clinton on police shootings: 'too many... ['Mj Lee', 'Cnn National Politics Reporter'] Real
2 Critical Counties is a CNN series exploring 11... Critical counties: Wake County, NC, could put ... ['Joyce Tseng', 'Eli Watkins'] Real
3 McCain Criticized Trump for Arpaio’s Pardon… S... NFL Superstar Unleashes 4 Word Bombshell on Re... [] Real
4 Story highlights Obams reaffirms US commitment... Obama in NYC: 'We all have a role to play' in ... ['Kevin Liptak', 'Cnn White House Producer'] Real
5 Obama weighs in on the debate\n\nPresident Bar... Obama weighs in on the debate ['Brianna Ehley', 'Jack Shafer'] Real
And I've tried to convert it into string
import nltk
import numpy as np
import random
import bs4 as bs
import re
data = df.astype(str)
data
However, when I try to tokenize the word it has error like this
corpus = nltk.sent_tokenize(data['text'])
TypeError: expected string or bytes-like object
But It seems doesn't work:( Has anybody know how to tokenize the sentences each rows in a column ['text']?