0
import pandas as pd
from bs4 import BeautifulSoup
import numpy as np
import requests
from time import sleep
from random import randint
import re

towns = pd.DataFrame()

url = f"https://www.city-data.com/city/Adak-Alaska.html"
page = requests.get(url).text
doc = BeautifulSoup(page, "html.parser")

table_data = doc.findAll("td")
#for i in table_data:
   #towns.append(table_data[i])
print(table_data)

I'm trying to get the data from the tables, like numbers of adherents to certain religions, ethnic groups, etc. When I look at the source page all that stuff is between the td tags but I'm not seeing it when I print out table_data. What am I doing wrong?

1 Answers1

0
import pandas as pd
from bs4 import BeautifulSoup
import numpy as np
import requests
from time import sleep
from random import randint
import re

towns = pd.DataFrame()

url = f"https://www.city-data.com/city/Adak-Alaska.html"
page = requests.get(url).text
doc = BeautifulSoup(page, "html.parser")

dfs = pd.read_html(page)
for x in dfs:
print(x) ## do what you will with the data

For instance, the religions would be table 17 (dfs[17]):

Religion    Adherents   Congregations
0   Orthodox    754 6
1   Evangelical Protestant  232 3
2   Catholic    185 1
3   Other   112 1
4   Mainline Protestant 82  1
5   None    4196    -

EDIT: Given the OP's insurmountable issues with his python install, a workaround would be:

url = "https://www.city-data.com/city/Adak-Alaska.html"
soup = BeautifulSoup(requests.get(url).text, "html.parser")
for x in soup.select('table'):
    for z in x.select('tr'):
        print([y.text.strip() for y in z.find_all(['td', 'th'])])
    print('________________')

Results can be further transformed in dataframes.

Barry the Platipus
  • 9,594
  • 2
  • 6
  • 30