2

I am using BeautifulSoup to create and write html file . I am able to create a simple html as shown below for MWE. However, all find functions return nothing, so unable to perform further operations (insert, append).

  1. What is happening?
  2. How do I set a style to one of the divs alone? (for eg, div2 and div3 shoudl have display:none which later I plan to enable via script)

MWE:

head_soup = BeautifulSoup(open(nbheader_template),"html.parser")
head_soup.contents[0]

base_template = "<!DOCTYPE html><html></html>"
main_soup = BeautifulSoup(base_template,"html.parser")

main_soup.html.append(head_soup)  # add nbconver header

# INSERT THE BODY AS IT IS
# bodies = [body.replace('<body>','').replace('</body>','') for body in bodies]  # no need of body tags
bodies = ['<div>Test div' + str(i+1) + '</div>' for i in range(3)] # for MWE
body_tag = main_soup.new_tag('body')
for each_body in bodies:
    body_tag.append(BeautifulSoup(each_body,'html.parser'))
main_soup.html.insert(1,body_tag)    


with open(output_filename, "w") as file:
    file.write(str(main_soup))

print(main_soup.find_all('head'))
print(main_soup.html.find_all('head'))
print(main_soup.find_all('body'))
print(main_soup.html.find_all('body'))
print(main_soup.find_all('div'))
print(main_soup.html.find_all('div'))

Output:
enter image description here

File Output:
enter image description here

Context: I am trying to combine multiple jupyter notebook html files. After this update, I need to add styles to individual divs corresponding to each html (each notebook) file.

Here is the nbviewer head

Parthiban Rajendran
  • 430
  • 1
  • 7
  • 18

1 Answers1

2

It looks as though BeautifulSoup is not properly adding the new navigable strings as navigable strings but instead as strings. This makes it so their find functions don't work on it, however if you take main_soup.prettify() and feed it back into beautiful soup you are able to navigate the output as expected.

main_soup
<!DOCTYPE html>
<html><body><div>Test div1</div><div>Test div2</div> 
<div>Test div3</div></body></html>
>>> new_soup = BeautifulSoup(main_soup.prettify())
>>> new_soup.body
<body>
<div>
 Test div1
</div><div>
 Test div2
</div><div>
 Test div3
</div>
</body>
>>> new_soup.html.find_all('div')
[<div>
 Test div1
</div>, <div>
 Test div2
</div>, <div>
 Test div3
</div>]

To set style to one of the divs, you can navigate to it and then add the class for the style you are wanting to add. Having different styles for each individual div becomes hefty unless you are wanting to use that style in only one place. I recommend using css with classes instead to define the styles on the divs you wish.

B.Adler
  • 1,499
  • 1
  • 18
  • 26
  • I just found that in a hard way of writing html and then reading back again in soup. Due to this I already have another doubt, could not figure out how to add style to the div elements via soup. I just updated the Q. can you kindly check and opine? – Parthiban Rajendran Dec 04 '18 at 17:11
  • such hardship also makes me wonder, is soup right tool for creating and editing html? (or only for parsing existing ones like scraping?) – Parthiban Rajendran Dec 04 '18 at 17:12
  • It's a tool used mostly for scraping, not often for editing. It is decent for editing plain html, but kind of heavy for adding styling. I would not recommend it for css. – B.Adler Dec 04 '18 at 17:14
  • Depending on what parser you are using, BeautifulSoup will also ignore a lot of tags like
    so it's not recommended for combining files reliably either.
    – B.Adler Dec 04 '18 at 17:19
  • Additionally, BeautifulSoup's select method using str.split(".") to do query selectors with its select method so it is not reliable for using to get the correct class names on all pages as it does not properly escape characters in selectors. Its find method is more reliable, but also slower, so there are trade offs to using it. – B.Adler Dec 04 '18 at 17:21
  • In that case, what is the current better alternate tool for my purpose? (I am checking yattag now) – Parthiban Rajendran Dec 04 '18 at 17:22
  • https://stackoverflow.com/questions/6748559/generating-html-documents-in-python A lot of people suggest using django, though yattag is another good option. – B.Adler Dec 04 '18 at 17:31