Remove offending characters from strings in list

Question

Sample data to parse (a list of unicode strings):

[u'\n', u'1\xa0', u'Some text here.', u'\n', u'1\xa0', u'Some more text here.', 
u'\n', u'1\xa0', u'Some more text here.']

I want to remove \xa0 from these strings.

EDIT: Current Method Not Working:

def remove_from_list(l, x):
  return [li.replace(x, '') for li in l]

remove_from_list(list, u'\xa0')

I'm still getting the exact same output.

Check these, http://stackoverflow.com/questions/3939361/remove-specific-characters-from-a-string-in-python, http://www.tutorialspoint.com/python/string_replace.htm — Rupak, May 17 '13 at 19:09
Which part of this do you not know how to do? How to turn `u'1\xa0'` into `u'10'`? Or how to do the same thing for each element in a list? — abarnert, May 17 '13 at 19:11
OK, your problem is _exactly_ the same as the one Rupak linked to: `replace` returns a new string, it doesn't mutate a string in-place. — abarnert, May 17 '13 at 19:13
@abarnert see updated code, I am doing what the other post recommends for regex and it isn't working — Dan, May 17 '13 at 19:20
@DanO'Day: The updated code has either the same problem (`re.sub` _also_ doesn't modify its argument in any way, it just returns a new string) or a related one (rebinding `li` doesn't do anything to whatever `li` used to be bound to). — abarnert, May 17 '13 at 19:21
Also, you don't need to use `re.sub` instead of `replace` here. The other post only does that as a shortcut to replace multiple different characters at once; you only have one character to replace. — abarnert, May 17 '13 at 19:28
The new `remove_from_list(list, u'\xa0')` very definitely works. Print out what it returns; the `\xa0` characters are all gone. But if you were expecting it to modify the `list` variable in-place, it's not going to do that. The answer you got it from explains why. — abarnert, May 17 '13 at 19:34
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/30151/discussion-between-dan-oday-and-abarnert) — Dan, May 17 '13 at 19:39

abarnert · Accepted Answer · 2013-05-17T19:56:24.580

The problem is different in each version of your code. Let's start with this:

newli = re.sub(x, '', li)
l[li].replace(newli)

First, newli is already the line you want—that's what re.sub does—so you don't need replace here at all. Just assign newli.

Second, l[li] isn't going to work, because li is the value of the line, not the index.

In this version, it's a but more subtle:

li = re.sub(x, '', li)

re.sub is returning a new string, and you're assigning that string to li. But that doesn't affect anything in the list, it's just saying "li no longer refers to the current line in the list, it now refers to this new string".

To only way to replace the list elements is to get the index so you can use the [] operator. And to get that, you want to use enumerate.

So:

def remove_from_list(l, x):
  for index, li in enumerate(l):
    l[index] = re.sub(x, '', li)
  return l

But really, you probably do want to use str.replace—it's just that you want to use it instead of re.sub:

def remove_from_list(l, x):
  for index, li in enumerate(l):
    l[index] = li.replace(x, '')
  return l

Then you don't have to worry about what happens if x is a special character in regular expressions.

Also, in Python, you almost never want to modify an object in-place, and also return it. Either modify it and return None, or return a new copy of the object. So, either:

def remove_from_list(l, x):
  for index, li in enumerate(l):
    newli = li.replace(x, '')
    l[index] = newli

… or:

def remove_from_list(l, x):
  new_list = []
  for li in l:
    newli = li.replace(x, '')
    new_list.append(newli)
  return new_list

And you can simply the latter to a list comprehension, as in unutbu's answer:

def remove_from_list(l, x):
  new_list = [li.replace(x, '') for li in l]
  return new_list

The fact that the second one is easier to write (no need for enumerate, has a handy shortcut, etc.) is no coincidence—it's usually the one you want, so Python makes it easy.

I don't know how else to make this clearer, but one last try:

If you choose the version that returns a fixed-up new copy of the list instead of modifying the list in-place, your original list will not be modified in any way. If you want to use the fixed-up new copy, you have to use the return value of the function. For example:

>>> def remove_from_list(l, x):
...     new_list = [li.replace(x, '') for li in l]
...     return new_list
>>> a = [u'\n', u'1\xa0']
>>> b = remove_from_list(a, u'\xa0')
>>> a
[u'\n', u'1\xa0']
>>> b
[u'\n', u'1']

The problem you're having with your actual code turning everything into a list of 1-character and 0-character strings is that you don't actually have a list of strings in the first place, you have one string that's a repr of a list of strings. So, for li in l means "for each character li in the string l, instead of for each stringliin the listl`.

For some reason it still isn't working. I am using `return [li.replace(x, '') for li in l]` based on your last line but it still has those characters in place. — Dan, May 17 '13 at 19:30
I just updated the answer to show what I did based on this answer. — Dan, May 17 '13 at 19:32
This won't modify `l` in-place, it will return a new list with those characters stripped out of each string. You have to print that new list, or assign it to something, or whatever. — abarnert, May 17 '13 at 19:33
I am, just not showing in my example - I'll update my question to show you. — Dan, May 17 '13 at 19:34

Jon Clements · Answer 2 · 2013-05-17T19:29:07.617

3

Another option if you're only interested in ASCII chars (as you mention characters, but this also also happens to work for the case of the posted example):

[text.encode('ascii', 'ignore') for text in your_list]

edited May 17 '13 at 19:29

answered May 17 '13 at 19:22

Jon Clements

138,671
33
247
280

unutbu · Answer 3 · 2013-05-17T19:18:04.300

1

You could use a list comprehension and str.replace:

>>> items
[u'\n',
 u'1\xa0',
 u'Some text here.',
 u'\n',
 u'1\xa0',
 u'Some more text here.',
 u'\n',
 u'1\xa0',
 u'Some more text here.']
>>> [item.replace(u'\xa0', u'') for item in items]
[u'\n',
 u'1',
 u'Some text here.',
 u'\n',
 u'1',
 u'Some more text here.',
 u'\n',
 u'1',
 u'Some more text here.']

edited May 17 '13 at 19:18

answered May 17 '13 at 19:10

unutbu

842,883
184
1,785
1,677

@DanO'Day: _What_ valid characters do you want to maintain that this version doesn't? This retains everything except for `\xa0`, which is exactly what you asked for. – abarnert May 17 '13 at 19:22
@DanO'Day: The code didn't change. – Matthias May 17 '13 at 19:24
@Matthias my bad, still not working though – Dan May 17 '13 at 19:37
2

What does "not working" mean? When you run this exact code in your Python interpreter, you get different results that unutbu showed? Or the results unutbu showed are wrong in some way? – abarnert May 17 '13 at 19:40

Remove offending characters from strings in list

3 Answers3

Linked