BeautifulSoup encoding.bytes has no attribute called find_all?

Question

 `self.urlOpen=urllib.request.urlopen("http://facebook.com")
  self.content=self.urlOpen.read()
  soup=BeautifulSoup(self.content,"html5lib")
  self.links=soup.find_all("a")`

'charmap' codec can't encode characters in position....

so when i try to encode the soup variable self.urlOpen=urllib.request.urlopen("http://facebook.com") self.content=self.urlOpen.read() soup=BeautifulSoup(self.content,"html5lib") soup=soup.encode("utf-8") self.links=soup.find_all("a")

'bytes' object has no attribute called find_all

I have tried self.urlOpen=urllib.request.urlopen("http://facebook.com") self.content=self.urlOpen.read() soup=BeautifulSoup(self.content.decode("utf-8","ignore"),"html5lib") self.links=soup.find_all("a")

but same error occurs

Then how should I encode it?

Also, `soup.encode('utf-8')` just [creates a byte-string](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#non-pretty-printing) out of the HTML, which of course has no method `find_all()`. — Ilja Everilä, Jul 04 '16 at 10:19
You're probably suffering from [this](http://stackoverflow.com/questions/14284269/why-doesnt-python-recognize-my-utf-8-encoded-source-file): your terminal can't handle the output instead of any problems with beautifulsoup etc. — Ilja Everilä, Jul 04 '16 at 10:23
thank you @llja Everila.that was breif.but how should i encode to prevent such error? — Sriker Ch, Jul 04 '16 at 10:35
Iam using `Python 3.4.0 (v3.4.0:04f714765c13, Mar 16 2014, 19:25:23) [MSC v.1600 64 bit (AMD64)] on win32` — Sriker Ch, Jul 04 '16 at 11:07
yes.I have Visual studio 2015.But there was problem importing the libraries so iam using cmd — Sriker Ch, Jul 04 '16 at 11:18
Actually iam trying this to print all the links in a text file?will it make a difference? — Sriker Ch, Jul 04 '16 at 11:20

score 0 · Accepted Answer · edited May 23 '17 at 10:28

0

What is the issue?
find_all shouldn't be throwing an encoding error, and you shouldn't be calling encode on a bs4.BeautifulSoup object, as encode returns a bytestring - not a soup! - so you can't call find_all on it.

Are you using soup.prettify() anywhere? In that case, that is probably the line throwing an error. Please include a Minimal, Complete and Verifiable example of your code.

edited May 23 '17 at 10:28

Community

1
1

answered Jul 04 '16 at 10:23

Nee

566
2
17

but my question is how to encode it so that i can call find_all – Sriker Ch Jul 04 '16 at 10:46
1

You *should* be able to use find_all. Please post a representative example of your code and the full traceback of the error message. – Nee Jul 04 '16 at 10:53
`import urllib import urllib.request from bs4 import BeautifulSoup urlOpen=urllib.request.urlopen("http://facebook.com") content=self.urlOpen.read() soup=BeautifulSoup(content,"html5lib") links=soup.find_all("a") f=open("links.txt","w+") for link in links: href=link.get("href") text=link.text f.write(text + "--\t" + self.href + "\n") f.flush() f.close() ` – Sriker Ch Jul 04 '16 at 11:10
1

First off, you're setting `href` to `link.get("href")` but later inserting `self.href` instead of `href` into the string - that should throw an error. Second, it's probably `f.write` that's throwing the error. Try opening the file using `f = open("links,txt", "w", encoding='utf-8')`. Third, your example works just fine for me. – Nee Jul 04 '16 at 11:21
ssorry my bad.self was a typo.but i have to try encoding='utf-8'. – Sriker Ch Jul 04 '16 at 11:22
1

Thanks that worked!!`f=open("link.txt","w+",encoding="utf-8")`. – Sriker Ch Jul 04 '16 at 11:29

BeautifulSoup encoding.bytes has no attribute called find_all?

1 Answers1