0

I'm trying to get to a web using httplib (or urllib2, for me both are just fine).

I just want to access it to parse the HTML and look for something. However, no matters how I try to achieve it, all them end in an error from the server.

For example:

import httplib
conn = httplib.HTTPSConnection("mangapanda.onl")
conn.request("GET", "/")
response = conn.getresponse()
print response.status, response.reason

Ends with:

500 Internal Server Error

And:

import urllib2
redirect_handler= urllib2.HTTPRedirectHandler()
opener = urllib2.build_opener(redirect_handler)
r = opener.open('https://www.mangapanda.onl/')
print r.status, r.reason

Raises an exception in the open function with:

urllib2.HTTPError: HTTP Error 403: Forbidden

I've tried with several URLs within each library, removing the ending "/" from the URL and so forth, but I've not been able to achieve it yet.

Furthermore, which I really want is to understand why is this happening. The only reason I've think about is that the web should be using some kind of redirect for requests that maybe the library isn't able to follow, but then again after the last snippet I thought it should follow it.

Is it a URL syntax problem? How should I write it? Why? How can I solve this?

Btc Sources
  • 1,912
  • 2
  • 30
  • 58

1 Answers1

1

It probably due to the server not knowing where the request is coming from. Also, some websites don't allow requests they deem as bot activity. In order to fix that problem, you could provide fake information for the request. Check out the urllib2 request library. Also here's how to enter the "fake data", or headers.

Joseph
  • 26
  • 4
  • Thanks for your explanation Joseph. I've solved it using headers as you said. However, I was using urllib2 library and it has its own way to use them, no need for another library. Would you like to update your answer with urllib2 info, so I can accept it? ;) Anyway I'm upvoting. – Btc Sources Nov 21 '18 at 13:22
  • Done! Linked the right library as well as a stack overflow post on how to fix it with urllib2 headers. – Joseph Nov 22 '18 at 01:04