Python Replacing Values in a List

Question

I have a list that looks like this:

stuff = ['\n', '<td><nobr>8h</nobr></td>', '\n', '<td><nobr>2021-04-02 14:27:44.729</nobr></td>', '\n', '<td class="text-right">1.73</td>;', '\n']

I am trying to clean it up so that it looks like this:

stuff = ["8h","2021-04-02 13:27:44.729","1.73"]

What I am trying to do is this:

for x in range(0,len(stuff),1):
     stuff[x] = stuff[x].replace("\n","")
     stuff[x] = stuff[x].replace("<td>","")

I am hoping to remove the characters if they are there. If not, I'm hoping that part will just be skipped.

The error message I am getting is

NoneType Object is not callable.

Any suggestions?

Edit #1:

I believe this has something to do with the \n values messing things up. I'm not sure if this is accurate, but that's my feeling.

Why `for x in range(0,len(stuff),1):` instead of `for x in stuff:`? Also, this could help: [Python code to remove HTML tags from a string](https://stackoverflow.com/questions/9662346/python-code-to-remove-html-tags-from-a-string). — GG., Apr 02 '21 at 23:43
I'll take a look at the link, but using for x in range(0,len(stuff),1) is just how I've always did it. Is there is a reason to use 1 over the other? — Chicken Sandwich No Pickles, Apr 02 '21 at 23:45
I am thinking if you accidentally set stuff to None before you hit the loop. Have you tried stepping through the code with breakpoints and debugging it? Also, I am presuming that in your actual code, the second item in the array stuff is also a string. Right now only the \n is a string. — Druhin Bala, Apr 02 '21 at 23:46
`for x in stuff` is cleaner - unless you specifically need the index for computation — Druhin Bala, Apr 02 '21 at 23:47
you could use beautifulsoup in case you have installed already (seems like you webscraped these data). and then get text from each element of your list: soup = BeautifulSoup("8h", "lxml") soup.find("td").text — Je Je, Apr 02 '21 at 23:54
Does this answer your question? [Strip HTML from strings in Python](https://stackoverflow.com/questions/753052/strip-html-from-strings-in-python) (strip html using answers from this question, then use [strip](https://docs.python.org/3.4/library/stdtypes.html?highlight=strip#str.strip) to remove the \n etc.) — Stuart, Apr 03 '21 at 00:55

TheFaultInOurStars · Answer 1 · 2021-04-03T00:09:09.413

I should say I'm definitely not proud of my code, but here is what I came up with:

import re
stuff = ['\n', '<td><nobr>8h</nobr></td>', '\n', '<td><nobr>2021-04-02 14:27:44.729</nobr></td>', '\n', '<td class="text-right">1.73</td>;', '\n']
def get_stuff(el):
    pattern1 = "<td><nobr>(?P<inner>.+)<\/nobr><\/td>"
    pattern2 = "<td class=(\s+)?\".+\"(\s+)?>(?P<inner>.+)\<\/td>"
    result1 = re.search(pattern1, el)
    result2 = re.search(pattern2, el)
    if result1:
        return result1.group("inner")
    if result2:
        return result2.group("inner")
last_list = list(map(get_stuff, stuff))
print( [x for x in last_list if x is not None])

Result

['8h', '2021-04-02 14:27:44.729', '1.73']

Update

So I came up with a better idea (still not proud of)

import re
stuff = ['\n', '<td><nobr>8h</nobr></td>', '\n', '<td><nobr>2021-04-02 14:27:44.729</nobr></td>', '\n', '<td class="text-right">1.73</td>;', '\n']
def get_stuff(el):
    pattern = "\<(\/)?nobr\>|\<(\/)?td(\s+)?(class(\s+)?\=(\s+)?\".+\"(\s?))?>|\\n|\;"
    a  = re.sub(pattern, "", el)
    return a
last_list = list(map(get_stuff, stuff))
print( [x for x in last_list if x != ''])

Result(still same):

['8h', '2021-04-02 14:27:44.729', '1.73']

I'll play with it, whatever any solution anyone has is still better than what I currently have. Thanks — Chicken Sandwich No Pickles, Apr 03 '21 at 00:08

score 1 · Answer 2 · answered Apr 03 '21 at 00:36

If my understanding is correct, you want to remove two types of contents:

anything between < and >;
a list of undesirable characters, e.g. \n or ;.

The below snippet does the job.


stuff = ['\n', '<td><nobr>8h</nobr></td>', '\n', '<td><nobr>2021-04-02 14:27:44.729</nobr></td>', '\n', '<td class="text-right">1.73</td>;', '\n']

import re
ans = []
for x in stuff:
    x = re.sub(r"<.*?>", "", x) # remove <>
    x = re.sub(r"(\n|;)", "", x) # remove unwanted characters
    if x: ans.append(x)

print(ans)

Python Replacing Values in a List

2 Answers2