0

I have some tags as below, they used non standard tag with style "display: none". these can't be parsed so that I want to replace style="display: none;" to empty string or to style="display: inline;".

...
<section id="box3" class="nodisp_zero" style="display: none;">
    <h1 id="box_ttl3" style="display: none;"></h1>
    <img style="width: 100%; display: none;" id="box_img3" alt="box3" src="https://smtgvs.weathernews.jp/s/topics/img/dummy.png" class="lazy" data-original="https://smtgvs.weathernews.jp/s/topics/img/201808/201808220015_box_img3_A.jpg?1533975785">
    <figcaption id="box_caption3" style="display: none;"></figcaption>
    <div class="textarea clearfix">
        <h2 id="box_subttl3" style="display: none;"></h2>
        <div class="fontL" id="box_com3" style="display: none;"></div>
    </div>
</section>
...

I tried to use this code, but I got error TypeError: 'NoneType' object is not callable, what can I do?

driver.get(href)
soup_level2 = BeautifulSoup(driver.page_source, 'lxml')
soup_level2 = soup_level2.replace(r'display:\s*none', "")
images = soup_level2.find_all('img')
wp78de
  • 18,207
  • 7
  • 43
  • 71
mikezang
  • 2,291
  • 7
  • 32
  • 56

1 Answers1

0

You can remove the style attribute like this:

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_doc, 'html.parser')
for tag in soup.findAll(lambda tag:tag.has_attr('style')):
  tag["style"] = tag["style"].replace("display: none;", "")

Demo

Or using a simple regex replace:

html_doc = re.sub(r"display:\s*none;?", "", html_doc, 0)

Demo


There are several ways to wait for the content to be loaded using Selenium, e.g.

element_present = EC.presence_of_element_located((By.ID, 'element_id'))
    WebDriverWait(driver, timeout).until(element_present)
wp78de
  • 18,207
  • 7
  • 43
  • 71
  • Your code also doesn't work and I found reason. If I use selenium to get page, the javascript function will be run when page load so that those tags with ` – mikezang Aug 27 '18 at 08:45
  • @mikezang try to wait for the content. Even sleep the thread for a few seconds could help. – wp78de Aug 27 '18 at 16:09
  • it is not contents full unloaded problems. – mikezang Aug 28 '18 at 00:08