Replacing parts of strings in a list in Python

Question

I know similar questions exist for this topic but I've gone through them and still couldn't get it.

My python program retrieves a subsection of html from a page using a regular expression. I just realised that I hadn't accounted for html special characters getting in the way.

say I have:

regex_title = ['I went to the store', 'Itlt's a nice day today', 'I went home for a rest']

I obviously want to change lt' to a single quote '.

I've tried variations of:

for each in regex_title:
    if 'lt&#039;' in regex_title:
        str.replace("lt&#039;", "'")

but had no success. What am I missing.

NOTE: The purpose is to do this without importing any more modules.

This seems odd to me. unescaping that html should leave you with `Itlt's`, not `It's`... — mgilson, Oct 03 '14 at 06:23
Also note that [there may be a better way](http://stackoverflow.com/a/2087433/748858) ... — mgilson, Oct 03 '14 at 06:24

score 3 · Accepted Answer · answered Oct 03 '14 at 06:19

3

str.replace does not replace in-place. It returns the replaced string. You need to assigned back the return value.

>>> regex_title = ['I went to the store', 'Itlt&#039;s a nice day today',
...                'I went home for a rest']
>>> regex_title = [s.replace("lt&#039;", "'") for s in regex_title]
>>> regex_title
['I went to the store', "It's a nice day today", 'I went home for a rest']

answered Oct 03 '14 at 06:19

falsetru

357,413
63
732
636

Excellent. Thank you. I didn't realise I had to reassign the variable. – doxyl Oct 03 '14 at 06:29

score 2 · Answer 2 · answered Oct 03 '14 at 06:24

2

If your task is to unescape HTML, then better use unescape function:

>>> ll = ['I went to the store', 'Itlt&#039;s a nice day today', 'I went home for a rest']
>>> import HTMLParser
>>> h = HTMLParser.HTMLParser()
>>> print map(h.unescape, ll)
['I went to the store', u"Itlt's a nice day today", 'I went home for a rest']

answered Oct 03 '14 at 06:24

stalk

11,934
4
36
58

+1 for recommending `map()`. Though I'd probably suggest list comprehensions to a Python newbie as the OP appears to be, since their syntax looks a bit more like the `for` loop that his instinct will be pushing him to write. – rmunn Oct 03 '14 at 06:28

Mazdak · Answer 3 · 2014-10-03T06:25:08.517

1

You need to change your code to this :

for each in regex_title:
    if 'lt&#039;' in each:
        each.replace("lt&#039;", "'")

But it doesn't change your list so you need to pass the replaced index to list:

>>> for each in regex_title:
...         if 'lt&#039;' in each:
...             regex_title[regex_title.index(each)]=each.replace("lt&#039;", "'")
... 
>>> regex_title
['I went to the store', "It's a nice day today", 'I went home for a rest']
>>>

edited Oct 03 '14 at 06:25

answered Oct 03 '14 at 06:19

Mazdak

105,000
18
159
188

There's a MUCH better way to do this, using list comprehensions: `new_list = [s.replace("lt'", "'") for s in old_list]`. But as I said in my answer, better to use HTMLParser. – rmunn Oct 03 '14 at 06:25
thanks for reminding i know that there is a lot ways for this aim , but at first i attempt to refine the OP's idea ! then i will add another ways ! – Mazdak Oct 03 '14 at 06:27
If you need indexes, use `enumerate`. Using `list.index` inside a loop is not efficient. – falsetru Oct 03 '14 at 06:30
i think using `enumerate` in this case decrease the performance because it need to store the index for every index in every cycle ! while we just need one index ! – Mazdak Oct 03 '14 at 06:36
i see that and dont find anything ! about our topic ! – Mazdak Oct 03 '14 at 06:42
`list.index` has O(n) time complexity (same for `x in list`). – falsetru Oct 03 '14 at 06:52
yes ,exactly !but there is not any thing about comparison between `enumerate` and `list.index` i think its desire for a good benchmarking .... any way thanks for proposition – Mazdak Oct 03 '14 at 07:01

score 1 · Answer 4 · edited May 23 '17 at 12:03

You don't explain why you want to avoid importing standard library modules. There are very few good reasons to deny yourself the use of Python's included batteries; unless you have such a reason (and if you do, you should state it), you should use the functionality provided to you.

In this case, it's the unescape() function from the html module: ¹

from html import unescape

titles = [
    'I went to the store',
    'It&#039;s a nice day today',
    'I went home for a rest'
]

fixed = [unescape(s) for s in titles]

>>> fixed
['I went to the store', "It's a nice day today", 'I went home for a rest']

Reimplementing html.unescape() yourself is

Pointless.
Error-prone.
Going to mean constantly going back and adding new cases when new HTML entities crop up in your data.

¹ Since Python 3.4, anyway. For previous versions, use HTMLParser.HTMLParser.unescape() as per @stalk's answer.

It was the requirement of a project to not import other modules. I'm sure this will be useful for someone in the future though, so there you go. Cheers. — doxyl, Oct 03 '14 at 06:59
Worth to point, that this code is for python3, not for python2 (`html` module is in python3 only) — stalk, Oct 03 '14 at 07:00

score 0 · Answer 5 · edited May 23 '17 at 10:33

Instead of doing this yourself, you'd be better off using the HTMLParser library, as described in https://stackoverflow.com/a/2087433/2314532. Read that question and answer for all the details, but the summary is:

import HTMLParser
parser = HTMLParser.HTMLParser()
print parser.unescape('&#039;')
# Will print a single ' character

So in your case, you'd want to do something like:

import HTMLParser
parser = HTMLParser.HTMLParser()
new_titles = [parser.unescape(s) for s in regex_title]

That will unescape any HTML escape, not just the ' escape that you asked about, and process the entire list all at once.

score 0 · Answer 6 · answered Oct 03 '14 at 08:23

0

Try like this:-

 regex_title = ['I went to the store', 'Itlt&#039;s a nice day today', 'I went home for a rest']
 str=','.join(regex_title)
 str1=str.replace("lt&#039;","'");    
 print str1.split()

answered Oct 03 '14 at 08:23

Hussain Shabbir

14,801
5
40
56

Replacing parts of strings in a list in Python

6 Answers6