I am running a function over my list that includes a dictionary look-up, so I need to remove all non-dictionary words because I'm getting a key error if I don't. I can't just use "continue" because I'm not doing this in a loop. I don't think I have very many so I can do it one by one if I have to (although I would prefer not to). The objects in the list are all in unicode, which has been making it more difficult to remove them.
My list looks like this:
my_list:
[[u'stuff',
u'going',
u'moment',
u'mj',
u've',
u'started',
u'listening',
u'music'
etc...
or, if I call it like this I get a single bracket:
my_list[0]:
[u'stuff',
u'going',
u'moment',
u'mj',
u've',
u'started',
u'listening',
u'music',
etc...
I've tried things like:
my_list.remove("mj")
and
my_list.remove("u'mj'")
and
my_list.remove[0,3]
Any ideas? Thanks
Edit: Response to Kevin: Here's how I got the data the way it is
my_list = []
for review in train["review"]:
my_list.append(review_to_wordlist(review, remove_stopwords=True))
and the function is here:
def review_to_wordlist(review, remove_stopwords=False):
#remove html
review_text = BeautifulSoup(review).get_text()
#remove non-letters
#possibly update this later to include numbers?
review_text = re.sub("[^a-zA-Z]"," ", review_text)
#convert words to lower case and split
words = review_text.lower().split()
if remove_stopwords:
stops = set(stopwords.words("english"))
words = [w for w in words if not w in stops]
return(words)