Web Scraping using BeauitifulSoup error : [Errno 10061]

Question

Trying to make this piece of code work : ( web scraping sample using BeautifulSoup )

import urllib2    
wiki = "https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India"
page = urllib2.urlopen(wiki)
from bs4 import BeautifulSoup
soup = BeautifulSoup(page)

I get this error :-

URLError: <urlopen error [Errno 10061] No connection could be made because the target machine actively refused it>

I guess it is to do with some firewall/security related issue, can someone help with what should be done?

Checkout http://stackoverflow.com/questions/1450132/proxy-with-urllib2 — Anish Shah, Dec 29 '16 at 10:32
@AnishShah - The code snippet there in the example throws the same error : URLError: — Indi, Dec 30 '16 at 09:27

Mohammad Yusuf · Answer 1 · 2016-12-29T12:06:30.983

1

You can try something like this with requests:

import requests
from bs4 import BeautifulSoup 

wiki = "https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India"
page = requests.get(wiki).content
soup = BeautifulSoup(page)

If you are trying to get the table, you can use pandas like this:

import pandas as pd

wiki = "https://en.wikipedia.org/wiki/List_of_state_and_union_territory_capitals_in_India"
df = pd.read_html(wiki)[1]
df2 = df.copy()
df2.columns = df.iloc[0]
df2.drop(0, inplace=True)
df2.drop('No.', axis=1, inplace=True)
df2.head()

Output:

edited Dec 29 '16 at 12:06

answered Dec 29 '16 at 10:31

Mohammad Yusuf

16,554
10
50
78

I end up with the same error :- ConnectionError: HTTPSConnectionPool(host='en.wikipedia.org', port=443): Max retries exceeded with url: /wiki/List_of_state_and_union_territory_capitals_in_India (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 10061] No connection could be made because the target machine actively refused it',)) ---- when I tried the first snippet. – Indi Dec 30 '16 at 09:00
@Indi This has to do something with proxies I guess. Read this: https://github.com/kennethreitz/requests/issues/2875 – Mohammad Yusuf Dec 30 '16 at 09:21
@Indi Can you give me the versions you are using --> `os`, `python`, `BeautifulSoup`, `requests`? – Mohammad Yusuf Dec 30 '16 at 09:39

Web Scraping using BeauitifulSoup error : [Errno 10061]

1 Answers1