0

This simple code works in Microsoft Edge but not in Chrome (both using Jupyter):

import pandas as pd
url_Chelsea = "https://en.wikipedia.org/wiki/List_of_Chelsea_F.C._seasons"

df_Chelsea=pd.read_html(url_Chelsea)[2]
df_Chelsea

getting the error message (end of message):

/opt/conda/lib/python3.6/site-packages/pandas/compat/__init__.py in raise_with_traceback(exc, traceback)
    338         if traceback == Ellipsis:
    339             _, _, traceback = sys.exc_info()
--> 340         raise exc.with_traceback(traceback)
    341 else:
    342     # this version of raise is a syntax error in Python 3

URLError: <urlopen error Tunnel connection failed: 403 Forbidden>

enter image description here enter image description here enter image description here

Alan
  • 157
  • 8
  • Have you tested with some other links? like other Wikipedia article also? Is this same for all other link? – imxitiz Aug 02 '21 at 01:46
  • Yes, I have tried for the 4 London clubs: Chelsea, Tottenham, Arsenal and Fulham and same problem, thanks – Alan Aug 02 '21 at 01:50
  • Are you using proxy? – imxitiz Aug 02 '21 at 02:02
  • Not really familiar with these technologies, so can't tell – Alan Aug 02 '21 at 02:09
  • Write this : I am getting this error message by inplementing your answer @Xitiz : <> [edit] from here – imxitiz Aug 02 '21 at 02:23
  • Just few seconds, getting message: It looks like your post is mostly code; please add some more details. So cannot submit. Can I submit just the plain text? will not be easy to read – Alan Aug 02 '21 at 02:30

1 Answers1

0

Try this :

import pandas as pd
import requests

url_Chelsea = "https://en.wikipedia.org/wiki/List_of_Chelsea_F.C._seasons"
proxyDict = { 
          'http'  : "add http proxy", 
          'https' : "add https proxy"
        }

requests.get(url_Chelsea , proxies=proxyDict)

df_Chelsea=pd.read_html(page)[2]
print(df_Chelsea)

For more information about proxies visit here

imxitiz
  • 3,920
  • 3
  • 9
  • 33
  • Without some more information i can't tell you, you should do this to solve your problem but anyway try this and inform . – imxitiz Aug 02 '21 at 02:15
  • Get a different error message. Last line: ProxyError: HTTPSConnectionPool(host='en.wikipedia.org', port=443): Max retries exceeded with url: /wiki/List_of_Chelsea_F.C._seasons (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 403 Forbidden',))) – Alan Aug 02 '21 at 02:17
  • Complete error message is too long to put in a comment (600 characters max) – Alan Aug 02 '21 at 02:21
  • I have commented in question follow that. :) – imxitiz Aug 02 '21 at 02:24
  • @Alan I hadn't notice that you have edited the comment. You are using proxy. follow last edit of answer. – imxitiz Aug 02 '21 at 02:34
  • Running last edit of answer gives the same error message which is now shown as added pictures in the edited post, please see – Alan Aug 02 '21 at 02:39
  • I have edited just few second ago. I don't think you implemented that. – imxitiz Aug 02 '21 at 02:40
  • I still get an error message, not the same. Last 3 lines: ProxyError: HTTPSConnectionPool(host='en.wikipedia.org', port=443): Max retries exceeded with url: /wiki/List_of_Chelsea_F.C._seasons (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError(': Failed to establish a new connection: [Errno -2] Name or service not known',))) – Alan Aug 02 '21 at 02:46
  • I believe you have added that `add http proxy` part with proxy, right? – imxitiz Aug 02 '21 at 02:48
  • The proxyDict? Yes I did. For line df_Chelsea=pd.read_html(page)[2], shouldn't it be url instead of page? – Alan Aug 02 '21 at 02:50
  • You can, that is just the name. Is that error occurring in both browser? – imxitiz Aug 02 '21 at 02:51
  • Ok, thanks for your time @Xitiz, really appreciate. I will just have to work with Edge when dealing with this pandas read_html function. Best – Alan Aug 02 '21 at 02:55