6

Starting from an Html input like this:

<p>
<a href="http://www.foo.com">this if foo</a>
<a href="http://www.bar.com">this if bar</a>
</p>

using BeautifulSoup, i would like to change this Html in:

<p>
<a href="http://www.foo.com">this if foo</a><b>OK</b>
<a href="http://www.bar.com">this if bar</a><b>OK</b>
</p>

Is it possible to do this using BeautifulSoup?

Something like:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html)
for link_tag in soup.findAll('a'):
    link_tag = link_tag + '<b>OK</b>' #This obviously does not work
contrebis
  • 1,287
  • 1
  • 11
  • 20
systempuntoout
  • 71,966
  • 47
  • 171
  • 241
  • Yes, it is possible. BeautifulSoup has pretty good documentation: http://www.crummy.com/software/BeautifulSoup/documentation.html Post your code if you have problems and I (and others) will help. – Jack May 25 '10 at 20:50
  • 4
    Actually this particular manipulation is not trivial to pick up from the documentation. – Jon W May 26 '10 at 13:16

2 Answers2

8

You can use BeautifulSoup's insert to add the element in the right place:

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html)

for link_tag in soup.findAll('a'):
    link_tag_idx = link_tag.parent.contents.index(link_tag)
    link_tag.parent.insert(link_tag_idx + 1, '<b>OK</b>')

This works for the example you give, though I'm not sure it's the only or most efficient method.

tcarobruce
  • 3,773
  • 21
  • 33
  • I think the last line should be ` link_tag.parent.insert(link_tag_idx + 1, BeautifulSoup("
    "))` as the "<" and ">" do not get converted properly on my machine.
    – sup Oct 29 '13 at 22:58
  • `soup.insert(i, tag)` is pretty counter-intuitive, as `i` doesn't directly mean anything.(Because my `soup.body.contents` = `['\n', '
    ...
    ', '\n', '', '\n', ...]`.) `.append()`, `.insert_after()` works OK, though. PS: @sup you should use `br = soup.new_tag('br')`.
    – Polv Jul 24 '18 at 08:00
5

You have the right idea. Just match up the types, and do replaceWith.

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html)
for link_tag in soup.findAll('a'):
    link_tag.replaceWith( link_tag.prettify() + '<b>OK</b>' )
print soup

should give you:

<p>
 <a href="http://www.foo.com">
this if foo
</a>
<b>OK</b>
 <a href="http://www.bar.com">
this if bar
</a>
<b>OK</b>
</p>
Jon W
  • 15,480
  • 6
  • 37
  • 47
  • 10
    Note for anyone hitting this answer from Google etc, in BS4 there is a insert_after() method, for example: `b = soup.new_tag("b"); b.string = "OK"; link_tag.insert_after(b)` ought to work. – contrebis Sep 17 '12 at 20:03