Using BeautifulSoup and python regexp to search html for string and add some tags

Question

I am using BeautifulSoup to look for user entered word on a specific page, and highlight all this word. For example, I want to highlight the all words 'Finance' which located on the page 'https://support.google.com/finance/?hl=en&ei=VC8QVaH0N-acwgP36IG4AQ'.

#!/usr/bin/python
# charset=utf-8

import urllib2
import re
from bs4 import BeautifulSoup

html = urllib2.urlopen('https://support.google.com/finance/?hl=en&ei=VC8QVaH0N-acwgP36IG4AQ').read()
soup = BeautifulSoup(html)

matches = soup.body(text='Finance')
for match in matches:
    match.wrap(soup.new_tag('span', style="background-color:#FE00FE"))
print soup

Try `soup.body.findAll(text='Finance')`. Does it work for you? — Wiktor Stribiżew, Jun 09 '15 at 08:59
Does [this SO post](http://stackoverflow.com/q/8936030/3832970) solve your problem? If yes, this question is a duplicate. — Wiktor Stribiżew, Jun 09 '15 at 09:06
No, same only first word 'Finance' is highlighted in the result file. In his question hi want to see if the string 'Python' is located on the page (one or more times). But me need highlight every this word on the page. — user2546252, Jun 09 '15 at 09:08
You are searching for text that's exactly the single word 'Finance'. You want to search all text _containing_ that word. Replacing the word by the highlighted word is bit more complicated too because you have to split strings into parts before, between, and after the word (if there is more than one occurrence in one string). — BlackJack, Jun 09 '15 at 14:12

user2546252 · Answer 1 · 2015-06-10T08:00:23.537

I found this variant of regex for word highlighting. But result document contain broken javascript

import urllib2
import re
from bs4 import BeautifulSoup

html = urllib2.urlopen('https://support.google.com/finance/?hl=en&ei=VC8QVaH0N-acwgP36IG4AQ').read()
soup = BeautifulSoup(html)

for text in soup.body.findAll(text=True):
    if re.search(r'inance\b',text):
        new_html = "<p>"+re.sub(r'(\w*)inance\b', r'<span style="background-color:#FF00FF">\1inance</span>', text)+"</p>"
        new_soup = BeautifulSoup(new_html)
        text.parent.replace_with(new_soup.p)
print soup

Using BeautifulSoup and python regexp to search html for string and add some tags

1 Answers1