How to string replace using lists/arrays in Python?

Question

I have the following inputs and desired outputs that I wish to replace in a HTML document, maybe using regular expressions or string replace.

if :
input: '<b>º </b>' 
output: ['º']

input: '<b>Nº </b>' 
output: []

input: '<b>1º </b>' 
output: []

input: '<b>1ª </b>' 
output: []

input: '<p>N<u>º </u></p>' 
output: ['º']

Attempt

l = [ ('<b>º </b>', ['º']), ('<b>Nº </b>', [])]

result = None
for i in l:
    codigo = re.sub(r'<(b|sup|s|u)>\s*[oº]\s*</(b|sup|s|u)>', 'º ', i[0], re.I)
    soup = BeautifulSoup(codigo, 'html.parser')
    result = soup.find_all('b', string='º')
    assert str(result) == l[1], "ops.."

How do I solve this problem?

You can post this to the Portuguese version of StackOverflow : https://pt.stackoverflow.com/ — Yassin Hajaj, May 12 '19 at 02:08
I'm voting to close this question as off-topic because it belongs to https://pt.stackoverflow.com/ — Yassin Hajaj, May 12 '19 at 02:08
@Emma into list, i work with a litle test with inputs and desired outputs. — britodfbr, May 12 '19 at 02:21
@Emma, i completed with others examples for input and desired output. Is better? — britodfbr, May 12 '19 at 03:29
Obligatory reference: https://stackoverflow.com/q/1732348/2988730 — Mad Physicist, May 12 '19 at 03:54

score 0 · Answer 1 · answered May 12 '19 at 11:28

I would try this: first, add your inputs to a list:

codi = ['<b>º </b>' ,'<b>Nº </b>' ,'<b>1º </b>', '<b>1ª </b>','<p>N<u>º </u></p>'  ]

Then process the list with BS:

for i in codi:
   soup = bs(i,'html.parser')
   print('input:',i)
   targets = soup.select('*:contains(º)')
   for target in targets:
       if  target.text.strip() == 'º':
           print('output:',target.text.strip())        
   print('--------------')

Output:

input: <b>º </b>
output º
--------------
input: <b>Nº </b>
--------------
input: <b>1º </b>
--------------
input: <b>1ª </b>
--------------
input: <p>N<u>º </u></p>
output º
--------------

Credit for the approach: numerous answers from @QHarr - the king of soup.select().

Jack Fleeting this implementation is fasten about with regex? — britodfbr, May 12 '19 at 11:39
@britodfbr - I don't know if it's faster (haven't tested it), but I personally dislike regex and if you google around you'll see that experts try to discourage the use of regex with html code. So I generally try to avoid it at all costs :) — Jack Fleeting, May 12 '19 at 11:53

How to string replace using lists/arrays in Python?

Attempt

1 Answers1