-1
 `self.urlOpen=urllib.request.urlopen("http://facebook.com")
  self.content=self.urlOpen.read()
  soup=BeautifulSoup(self.content,"html5lib")
  self.links=soup.find_all("a")`

'charmap' codec can't encode characters in position....

so when i try to encode the soup variable self.urlOpen=urllib.request.urlopen("http://facebook.com") self.content=self.urlOpen.read() soup=BeautifulSoup(self.content,"html5lib") soup=soup.encode("utf-8") self.links=soup.find_all("a")

'bytes' object has no attribute called find_all

I have tried self.urlOpen=urllib.request.urlopen("http://facebook.com") self.content=self.urlOpen.read() soup=BeautifulSoup(self.content.decode("utf-8","ignore"),"html5lib") self.links=soup.find_all("a")

but same error occurs

Then how should I encode it?

Sriker Ch
  • 133
  • 11
  • Include full tracebacks when asking for debugging help. – Ilja Everilä Jul 04 '16 at 10:15
  • Also, `soup.encode('utf-8')` just [creates a byte-string](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#non-pretty-printing) out of the HTML, which of course has no method `find_all()`. – Ilja Everilä Jul 04 '16 at 10:19
  • You're probably suffering from [this](http://stackoverflow.com/questions/14284269/why-doesnt-python-recognize-my-utf-8-encoded-source-file): your terminal can't handle the output instead of any problems with beautifulsoup etc. – Ilja Everilä Jul 04 '16 at 10:23
  • thank you @llja Everila.that was breif.but how should i encode to prevent such error? – Sriker Ch Jul 04 '16 at 10:35
  • Iam using `Python 3.4.0 (v3.4.0:04f714765c13, Mar 16 2014, 19:25:23) [MSC v.1600 64 bit (AMD64)] on win32` – Sriker Ch Jul 04 '16 at 11:07
  • yes.Iam runnig this in the cmd shell – Sriker Ch Jul 04 '16 at 11:14
  • yes.I have Visual studio 2015.But there was problem importing the libraries so iam using cmd – Sriker Ch Jul 04 '16 at 11:18
  • Actually iam trying this to print all the links in a text file?will it make a difference? – Sriker Ch Jul 04 '16 at 11:20

1 Answers1

0

What is the issue?
find_all shouldn't be throwing an encoding error, and you shouldn't be calling encode on a bs4.BeautifulSoup object, as encode returns a bytestring - not a soup! - so you can't call find_all on it.

Are you using soup.prettify() anywhere? In that case, that is probably the line throwing an error. Please include a Minimal, Complete and Verifiable example of your code.

Community
  • 1
  • 1
Nee
  • 566
  • 2
  • 17
  • but my question is how to encode it so that i can call find_all – Sriker Ch Jul 04 '16 at 10:46
  • 1
    You *should* be able to use find_all. Please post a representative example of your code and the full traceback of the error message. – Nee Jul 04 '16 at 10:53
  • `import urllib import urllib.request from bs4 import BeautifulSoup urlOpen=urllib.request.urlopen("http://facebook.com") content=self.urlOpen.read() soup=BeautifulSoup(content,"html5lib") links=soup.find_all("a") f=open("links.txt","w+") for link in links: href=link.get("href") text=link.text f.write(text + "--\t" + self.href + "\n") f.flush() f.close() ` – Sriker Ch Jul 04 '16 at 11:10
  • 1
    First off, you're setting `href` to `link.get("href")` but later inserting `self.href` instead of `href` into the string - that should throw an error. Second, it's probably `f.write` that's throwing the error. Try opening the file using `f = open("links,txt", "w", encoding='utf-8')`. Third, your example works just fine for me. – Nee Jul 04 '16 at 11:21
  • ssorry my bad.self was a typo.but i have to try encoding='utf-8'. – Sriker Ch Jul 04 '16 at 11:22
  • 1
    Thanks that worked!!`f=open("link.txt","w+",encoding="utf-8")`. – Sriker Ch Jul 04 '16 at 11:29