Find and replace strings in HTML

Question

From this HTML code:

<p class="description" dir="ltr">Name is a fine man. <br></p>

I'm looking for replacing "Name" using the following code:

target = soup.find_all(text="Name")
for v in target:
    v.replace_with('Id')

The output I would like to have is:

<p class="description" dir="ltr">Id is a fine man. <br></p>

When I:

print target
[]

Why doesn't it find the "Name"?

Thanks!

try using `str.replace()`, Are you sure you have some text that matches it? — sinhayash, Jul 04 '15 at 11:41
You have no Element with Text `Name`. Show your HTML, and what you intend to do. — Daniel, Jul 04 '15 at 11:50
Not really that clear what you're asking. Firstly did your `soup` find any `text='Name'`? Also is there any code in between `replace_with` and `print target`? — Paul Rooney, Jul 04 '15 at 11:51
@PaulRooney, apparently my `soup` did not find `text='Name'` and there is no code in between, — Diego, Jul 04 '15 at 12:03

score 8 · Accepted Answer · answered Jul 04 '15 at 12:02

The text node in your HTML contains some other text besides "Name". In this case, you need to relax search criteria to use contains instead of exact match, for example, by using regex. Then you can replace matched text nodes with the original text except for "Name" part should be replaced with "Id" by using simple string.replace() method, for example :

from bs4 import BeautifulSoup
import re

html = """<p class="description" dir="ltr">Name is a fine man. <br></p>"""
soup = BeautifulSoup(html)
target = soup.find_all(text=re.compile(r'Name'))
for v in target:
    v.replace_with(v.replace('Name','Id'))
print soup

output :

<html><body><p class="description" dir="ltr">Id is a fine man. <br/></p></body></html>

score 1 · Answer 2 · edited May 23 '17 at 12:14

It returns an empty list because searching for text like this must match the whole text in a tag, so use regular expression instead.

From the official docs: BeautifulSoup - Search text

text is an argument that lets you search for NavigableString objects instead of Tags. Its value can be a string, a regular expression, a list or dictionary, True or None, or a callable that takes a NavigableString object as its argument:

soup.findAll(text="one")
# [u'one']
soup.findAll(t ext=re.compile("paragraph"))
# [u'This is paragraph ', u'This is paragraph ']
soup.findAll(text=lambda(x): len(x) < 12)
# [u'Page title', u'one', u'.', u'two', u'.']

P.S.: Already already discussed answers are here and here.

Find and replace strings in HTML

2 Answers2