Import table from website with BeautifulSoup

Question

I am trying to import a table from a website and afterwards transform the data into a pandas dataframe.

The website is: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

Thats my code so far:

import numpy as np 
import pandas as pd 
import requests
from bs4 import BeautifulSoup

website_url = requests.get(
'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text 

soup = BeautifulSoup(website_url,'lxml')

My_table = soup.find('table',{'class':'wikitable sortable'})

for x in soup.find_all('table',{'class':'wikitable sortable'}):
    table = x.text


print(My_table)
print(table)

Output of print(My_table)

Output of print(table)

How do I convert this webpage table to a panda dataframe? panda dataframe

Answered here in terms of read_html https://stackoverflow.com/questions/55566117/i-have-some-problems-with-data-cleaning/55566486#55566486 Not sure whether that makes it a duplicate as you just want table. First part of solution will still work for that. — QHarr, May 26 '19 at 19:24

score 1 · Accepted Answer · answered May 26 '19 at 19:07

1

have you tried

pd.read_html()

?

Also, since the table is very standard, why not directly copy the table into excel and import it as DataFrame?

answered May 26 '19 at 19:07

Gen

441
4
11

Hey Gen, thnx for your answer. read_html() helped a lot but for some reason it does not contain the Neighbourhood column – City May 26 '19 at 19:37
pd.read_html(r'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M',match = 'Neighbourhood') – Gen May 26 '19 at 19:41
Strange, same result for me: df = pd.read_html(r'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M',match= 'Neighbourhood') print(df) – City May 26 '19 at 20:30
got it, thank you! df = pd.read_html(r'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M',match= 'Neighbourhood') type(df) len(df) df = df[0] df – City May 26 '19 at 20:52

Import table from website with BeautifulSoup

1 Answers1