An NLTK's CategorizedPlaintextCorpusReader
object isn't a dtype
for pandas
.
That being said, you can convert the movie reviews into list of tuples and then populate a dataframe as such:
import pandas as pd
from nltk.corpus import movie_reviews as mr
reviews = []
for fileid in mr.fileids():
tag, filename = fileid.split('/')
reviews.append((filename, tag, mr.raw(fileid)))
df = pd.DataFrame(reviews, columns=['filename', 'tag', 'text'])
[out]:
>>> df.head()
filename tag text
0 cv000_29416.txt neg plot : two teen couples go to a church party ,...
1 cv001_19502.txt neg the happy bastard's quick movie review \ndamn ...
2 cv002_17424.txt neg it is movies like these that make a jaded movi...
3 cv003_12683.txt neg " quest for camelot " is warner bros . ' firs...
4 cv004_12641.txt neg synopsis : a mentally unstable man undergoing ...
To process the text
column, see How to NLTK word_tokenize to a Pandas dataframe for Twitter data?