I'm using python to retrieve an HTML source, but what comes out looks like this. What is this, and why am I not getting the actual page source?
b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xdb\x00C
I'm using python to retrieve an HTML source, but what comes out looks like this. What is this, and why am I not getting the actual page source?
b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xdb\x00C
This is an image. Specifically a jpeg. Since it's a byte stream python prints it with b'.............'
A jpeg starts with \xff\xd8\xff\
Try using BeautifulSoup
Here's an example How to correctly parse UTF-8 encoded HTML to Unicode strings with BeautifulSoup?
Basically, what you're seeing is encoded characters that need to be decoded.