1

I am attempting to save the text in a beautiful soup object to a file, that I can later edit and use. I've got all necessary modules imported, but for some reason I get the same error every time at "pagename.write(str(soup))" I've tried rewriting this multiple way and I am just stumped

#Testing implementation of writing to file
#save the HTML to a beautiful soup object
soup = BeautifulSoup(browser.page_source, 'html.parser')

#TODO: use breadcrumb of page name for loop later on
breadcrumb = soup.select('.breadcrumb span')
pagename = breadcrumb[0].get_text()

#open a file then write to it
bookPage = os.path.join('books/cpp/VST', pagename+'.txt')
open(pagename, 'wb')
pagename.write(str(soup))

#close file
#pagename.close()


#TODO: move on to next file
Dakota Lorance
  • 143
  • 3
  • 9

1 Answers1

3

pagename is a string - the filename extracted from the HTML.

What you meant is to use the bookPage path and a with context manager. Plus, to avoid TypeError: a bytes-like object is required, not 'str' error and to get a bytestring, you need to call encode():

with open(bookPage, 'wb') as f:
    f.write(soup.encode("utf-8"))
Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195