I would like to download table from Wikipedia.org from this link as a Pandas Data Frame to Jupyter Lab: https://pl.wikisource.org/wiki/Polskie_powiaty_wed%C5%82ug_kodu_TERYT
There is only one table and not complicated, how can I do that in Python ?
I would like to download table from Wikipedia.org from this link as a Pandas Data Frame to Jupyter Lab: https://pl.wikisource.org/wiki/Polskie_powiaty_wed%C5%82ug_kodu_TERYT
There is only one table and not complicated, how can I do that in Python ?
Type 1:
Just use pandas
method pd.read_html
method and from extract what so ever df
you want
import pandas as pd
res=pd.read_html("https://pl.wikisource.org/wiki/Polskie_powiaty_wed%C5%82ug_kodu_TERYT")
df=res[3]
Type 2:
you can use both request
and bs4
module to find table and parse data to pandas
method
import requests
from bs4 import BeautifulSoup
res=requests.get("https://pl.wikisource.org/wiki/Polskie_powiaty_wed%C5%82ug_kodu_TERYT")
soup=BeautifulSoup(res.text,"html.parser")
data=soup.find_all("table")[3]
df=pd.read_html(str(data))
df[0]
Output:
Nazwa powiatu TERYT
0 aleksandrowski 04 01
1 augustowski 20 01
. ..... ..
You need to scrape HTML using requests library, after you need to search on tag using library (i use BeautifulSoup). The code is similar to this:
import requests
from bs4 import BeautifulSoup
URL = "https://pl.wikisource.org/wiki/Polskie_powiaty_wed%C5%82ug_kodu_TERYT"
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
results = soup.find("div", {"id":"mw-content-text"}).find("table",{"border":1}).find_all("td")
namelist = [results[i].text for i in range(0,len(results),2)]
numberlist = [results[i].text.strip('\n') for i in range(1,len(results),2)]
Then it returns a value of type string. Or you can get all values as a list. It's very simple to convert to pandas after.