1

I am running a function over my list that includes a dictionary look-up, so I need to remove all non-dictionary words because I'm getting a key error if I don't. I can't just use "continue" because I'm not doing this in a loop. I don't think I have very many so I can do it one by one if I have to (although I would prefer not to). The objects in the list are all in unicode, which has been making it more difficult to remove them.

My list looks like this:

my_list:
[[u'stuff',
  u'going',
  u'moment',
  u'mj',
  u've',
  u'started',
  u'listening',
  u'music'

etc...

or, if I call it like this I get a single bracket:

my_list[0]:
[u'stuff',
 u'going',
 u'moment',
 u'mj',
 u've',
 u'started',
 u'listening',
 u'music',

etc...

I've tried things like:

my_list.remove("mj")

and

my_list.remove("u'mj'")

and

my_list.remove[0,3]

Any ideas? Thanks

Edit: Response to Kevin: Here's how I got the data the way it is

my_list = []
for review in train["review"]:
    my_list.append(review_to_wordlist(review, remove_stopwords=True))

and the function is here:

def review_to_wordlist(review, remove_stopwords=False):
    #remove html
    review_text = BeautifulSoup(review).get_text()

#remove non-letters
#possibly update this later to include numbers?
review_text = re.sub("[^a-zA-Z]"," ", review_text)

#convert words to lower case and split
words = review_text.lower().split()

if remove_stopwords:
    stops = set(stopwords.words("english"))
    words = [w for w in words if not w in stops]

return(words)
user57391
  • 15
  • 5
  • How did you get the data into that structure in the first place? Is there a reason it's not already in a `set()` or dictionary? – Kevin Jan 26 '15 at 03:14
  • You probably want to be using `my_list.extend()` instead of `my_list.append()`. – Kevin Jan 26 '15 at 03:22

2 Answers2

1

You are close. The problem isn't the unicode, it's that you are calling remove on your outer list. Since your text list is a list inside a list, that is where you need to remove from.

Do this instead:

my_list[0].remove('mj')

You can also prefix that to be a unicode string (same result in this case):

my_list[0].remove(u'mj')

Example:

my_list = [[u'stuff',
  u'going',
  u'moment',
  u'mj',
  u've',
  u'started',
  u'listening',
  u'music'
  ]]
my_list[0].remove('mj')

print my_list

Outputs:

[[u'stuff', u'going', u'moment', u've', u'started', u'listening', u'music']]

Notice that the string mj is removed.

Andy
  • 49,085
  • 60
  • 166
  • 233
1

You mentioned that you were using the list for a key lookup.

Simply add the following line to your code to avoid the resulting key error:

if dict.has_key(list_item):
    # do your lookup

to avoid the error.

Vedaad Shakib
  • 739
  • 7
  • 20