1

From this HTML code:

<p class="description" dir="ltr">Name is a fine man. <br></p>

I'm looking for replacing "Name" using the following code:

target = soup.find_all(text="Name")
for v in target:
    v.replace_with('Id')

The output I would like to have is:

<p class="description" dir="ltr">Id is a fine man. <br></p>

When I:

print target
[]

Why doesn't it find the "Name"?

Thanks!

Diego
  • 637
  • 3
  • 10
  • 24

2 Answers2

8

The text node in your HTML contains some other text besides "Name". In this case, you need to relax search criteria to use contains instead of exact match, for example, by using regex. Then you can replace matched text nodes with the original text except for "Name" part should be replaced with "Id" by using simple string.replace() method, for example :

from bs4 import BeautifulSoup
import re

html = """<p class="description" dir="ltr">Name is a fine man. <br></p>"""
soup = BeautifulSoup(html)
target = soup.find_all(text=re.compile(r'Name'))
for v in target:
    v.replace_with(v.replace('Name','Id'))
print soup

output :

<html><body><p class="description" dir="ltr">Id is a fine man. <br/></p></body></html>
har07
  • 88,338
  • 12
  • 84
  • 137
1

It returns an empty list because searching for text like this must match the whole text in a tag, so use regular expression instead.

From the official docs: BeautifulSoup - Search text

text is an argument that lets you search for NavigableString objects instead of Tags. Its value can be a string, a regular expression, a list or dictionary, True or None, or a callable that takes a NavigableString object as its argument:

soup.findAll(text="one")
# [u'one']
soup.findAll(t ext=re.compile("paragraph"))
# [u'This is paragraph ', u'This is paragraph ']
soup.findAll(text=lambda(x): len(x) < 12)
# [u'Page title', u'one', u'.', u'two', u'.']

P.S.: Already already discussed answers are here and here.

Community
  • 1
  • 1
devautor
  • 2,506
  • 4
  • 21
  • 31