3

I'm using python to retrieve an HTML source, but what comes out looks like this. What is this, and why am I not getting the actual page source?

b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xdb\x00C

user5508043
  • 39
  • 1
  • 2

2 Answers2

5

This is an image. Specifically a jpeg. Since it's a byte stream python prints it with b'.............' A jpeg starts with \xff\xd8\xff\

click_twice
  • 191
  • 2
  • 12
0

Try using BeautifulSoup

Here's an example How to correctly parse UTF-8 encoded HTML to Unicode strings with BeautifulSoup?

Basically, what you're seeing is encoded characters that need to be decoded.

Community
  • 1
  • 1
AndrewSmiley
  • 1,933
  • 20
  • 32