0

I am trying to import a table from a website and afterwards transform the data into a pandas dataframe.

The website is: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

Thats my code so far:

import numpy as np 
import pandas as pd 
import requests
from bs4 import BeautifulSoup

website_url = requests.get(
'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text 

soup = BeautifulSoup(website_url,'lxml')

My_table = soup.find('table',{'class':'wikitable sortable'})

for x in soup.find_all('table',{'class':'wikitable sortable'}):
    table = x.text


print(My_table)
print(table)

Output of print(My_table)

Output of print(table)

How do I convert this webpage table to a panda dataframe? panda dataframe

City
  • 33
  • 1
  • 1
  • 3
  • Answered here in terms of read_html https://stackoverflow.com/questions/55566117/i-have-some-problems-with-data-cleaning/55566486#55566486 Not sure whether that makes it a duplicate as you just want table. First part of solution will still work for that. – QHarr May 26 '19 at 19:24

1 Answers1

1

have you tried

pd.read_html()

?

Also, since the table is very standard, why not directly copy the table into excel and import it as DataFrame?

Gen
  • 441
  • 4
  • 11
  • Hey Gen, thnx for your answer. read_html() helped a lot but for some reason it does not contain the Neighbourhood column – City May 26 '19 at 19:37
  • pd.read_html(r'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M',match = 'Neighbourhood') – Gen May 26 '19 at 19:41
  • Strange, same result for me: df = pd.read_html(r'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M',match= 'Neighbourhood') print(df) – City May 26 '19 at 20:30
  • got it, thank you! df = pd.read_html(r'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M',match= 'Neighbourhood') type(df) len(df) df = df[0] df – City May 26 '19 at 20:52