I'm looking for a more efficient way of loading text data into Python, instead of using .readlines()
, then manually parsing through the data. My goal here is to run different models on the text.
My classifiers are People's names, which are listed before the text of their... let's call them 'Reviews'... which are separated by ***
. Here is an example of the txt file:
Mike P, Review, December, 2013
Mike P, Review, June, 2013
Tom A, Review, December, 2013
Tom A, Review, June, 2013
Mark D, Review, December, 2013
Mark D, Review, June, 2012
Sally M, Review, December, 2011
***
This is Mike P's first review
***
This is Mike P's second review
***
This is Tom A's first review
***
Etc...
Ultimately, I need to create a bag of words from the 'Reviews'. I can do this in R, but I'm forcing myself to learn Python for data analysis and keep spinning my wheels every which way I turn.
Thanks in advance!