0

I need regex for extracting the text from the following tag: I am using Python & BeautifulSoup

    <h4 style="color:#000000; line-height:20px; font-size:18px; margin-left:22px;
 overflow:auto; content:inherit; padding:10px; font-family:"Book Antiqua", 
Palatino, serif;">THE TEXT TO BE EXTRACTED IS HERE</h4></div><br /></div>

I tried the following:

stylecontent = 'color:#000000; line-height:20px; font-size:18px; margin-left:22px;
     overflow:auto; content:inherit; padding:10px; font-family:"Book Antiqua", 
    Palatino, serif;'

soup = BeautifulSoup(br.response().read(), "lxml")

scrap_soup = soup.findAll('h4', {'style': stylecontent})

but It doesn't works always as the website keeps changing stylecontent. Now I want to use regex:

soup.find_all(re.compile("some_foo_regex")):

I am interested in that some_foo_regex.

Thanks.

Aniket Vij
  • 657
  • 1
  • 5
  • 9

1 Answers1

1

You may get all the h4 tags that have only one attribute style with

h4_tags = soup.find_all('h4', attrs = {'style' : True}) # Get all H4 tags with style attribute
for result in h4_tags:
    if len(result.attrs) == 1:                          # Print if it is the only attribute
        print result.contents                           # Print tag text contents
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Is there a way to get the text from these tags? Also My upvote doesn't works as I need to have atleast 15 reputation. BTW, I also get tags which have other tags including style. – Aniket Vij Aug 24 '15 at 15:06
  • Doesn't `result.contents` print the contents? You have 15 rep now, BTW :) Also, what about the lambda solution: `h4_list = soup.find_all(lambda tag:tag.name == "h4" and len(tag.attrs) == 1 and tag["style"]) \n for result in h4_list: \n print result.contents`? – Wiktor Stribiżew Aug 24 '15 at 15:07
  • `h4_tags = soup.find_all('h4', attrs = {'style' : True})` gives me all tags I need and few extra, but running that loop gives me only one result. – Aniket Vij Aug 24 '15 at 15:09
  • That gives me a `KeyError: 'style'` – Aniket Vij Aug 24 '15 at 15:17
  • Thanks for letting know. – Wiktor Stribiżew Aug 24 '15 at 15:31