Python requests.get() loop returns nothing

Question

When trying to scrape multiple pages of this website, I get no content in return. I usually check to make sure all the lists I'm creating are of equal length, but all are coming back as len = 0.

I've used similar code to scrape other websites, so why does this code not work correctly?

Some solutions I've tried, but haven't worked for my purposes: requests.Session() solutions as suggested in this answer, .json as suggested here.

for page in range(100, 350):

    page = requests.get("https://www.ghanaweb.com/GhanaHomePage/election2012/parliament.constituency.php?ID=" + str(page) + "&res=pm")

    page.encoding = page.apparent_encoding

    if not page:
        pass

    else:

        soup = BeautifulSoup(page.text, 'html.parser')

        ghana_tbody = soup.find_all('tbody')

        sleep(randint(2,10))

         for container in ghana_tbody:

            #### CANDIDATES ####
            candidate = container.find_all('div', class_='can par')
            for data in candidate:
                cand = data.find('h4')
                for info in cand:
                    if cand is not None:
                        can2 = info.get_text()
                        can.append(can2)

            #### PARTY NAMES ####
            partyn = container.find_all('h5')
            for data in partyn:
                if partyn is not None:
                    partyn2 = data.get_text()
                    pty_n.append(partyn2)

            #### CANDIDATE VOTES ####
            votec = container.find_all('td', class_='votes')
            for data in votec:
                if votec is not None:
                    votec2 = data.get_text()
                    cv1.append(votec2)

            #### CANDIDATE VOTE SHARE ####
            cansh = container.find_all('td', class_='percent')
            for data in cansh:
                if cansh is not None:
                    cansh2 = data.get_text()
                    cvs1.append(cansh2)

        #### TOTAL  VOTES ####`
        tfoot = soup.find_all('tr', class_='total')
        for footer in tfoot:
            fvote = footer.find_all('td', class_='votes')
            for data in fvote:
                if fvote is not None:
                    fvote2 = data.get_text()
                    fvoteindiv = [fvote2]
                    fvotelist = fvoteindiv * (len(pty_n) - len(vot1))
                    vot1.extend(fvotelist)

Thanks in advance for your help!

You need to first fix your indentation; it is not valid. And what is the point of the call to `sleep`? — Booboo, Nov 11 '20 at 18:12
@Booboo thanks; it's correct in my code, but not on the site. Fixed now. I call `sleep` to pause briefly before the code runs each loop to not overwhelm the server I'm scraping from/get blocked in the process. — Sara, Nov 11 '20 at 18:26
If you look at, for example, `https://www.ghanaweb.com/GhanaHomePage/election2012/parliament.constituency.php?ID=100&res=pm`, I don't see any data. If you do a `view source` on the page and search for `can per` (the class name on the
), it is not to be found. — Booboo, Nov 11 '20 at 18:28
@Booboo That's true; but for pages like [link](https://www.ghanaweb.com/GhanaHomePage/election2012/parliament.constituency.php?ID=112&res=pm) (aka `https://www.ghanaweb.com/GhanaHomePage/election2012/parliament.constituency.php?ID=112&res=pm`) there is
with the class name `can per`. Is there a reason info isn't being scraped from any pages in the range `100, 350`? — Sara, Nov 11 '20 at 18:49

score 1 · Accepted Answer · answered Nov 11 '20 at 19:30

I've made some simplification changes. The major changes that needed to be changed were:

ghana_tbody = soup.find_all('table', class_='canResults')
can2 = info # not info.get_text()

I have only tested this against page 112; life is too short.

import requests
from bs4 import BeautifulSoup
from random import randint
from time import sleep

can = []
pty_n = []
cv1 = []
cvs1 = []
vot1 = []

START_PAGE = 112
END_PAGE = 112

for page in range(START_PAGE, END_PAGE + 1):
    page = requests.get("https://www.ghanaweb.com/GhanaHomePage/election2012/parliament.constituency.php?ID=112&res=pm")
    page.encoding = page.apparent_encoding
    if not page:
        pass
    else:
        soup = BeautifulSoup(page.text, 'html.parser')
        ghana_tbody = soup.find_all('table', class_='canResults')
        sleep(randint(2,10))
        for container in ghana_tbody:

            #### CANDIDATES ####
            candidate = container.find_all('div', class_='can par')
            for data in candidate:
                cand = data.find('h4')
                for info in cand:
                    can2 = info # not info.get_text()
                    can.append(can2)

            #### PARTY NAMES ####
            partyn = container.find_all('h5')
            for data in partyn:
                partyn2 = data.get_text()
                pty_n.append(partyn2)


            #### CANDIDATE VOTES ####
            votec = container.find_all('td', class_='votes')
            for data in votec:
                votec2 = data.get_text()
                cv1.append(votec2)

            #### CANDIDATE VOTE SHARE ####
            cansh = container.find_all('td', class_='percent')
            for data in cansh:
                cansh2 = data.get_text()
                cvs1.append(cansh2)

        #### TOTAL  VOTES ####`
        tfoot = soup.find_all('tr', class_='total')
        for footer in tfoot:
            fvote = footer.find_all('td', class_='votes')
            for data in fvote:
                fvote2 = data.get_text()
                fvoteindiv = [fvote2]
                fvotelist = fvoteindiv * (len(pty_n) - len(vot1))
                vot1.extend(fvotelist)

print('can = ', can)
print('pty_n = ', pty_n)
print('cv1 = ', cv1)
print('cvs1 = ', cvs1)
print('vot1 = ', vot1)

Prints:

can =  ['Kwadwo Baah Agyemang', 'Daniel Osei', 'Anyang - Kusi Samuel', 'Mary Awusi']
pty_n =  ['NPP', 'NDC', 'IND', 'IND']
cv1 =  ['14,966', '9,709', '8,648', '969', '34292']
cvs1 =  ['43.64', '28.31', '25.22', '2.83', '\xa0']
vot1 =  ['34292', '34292', '34292', '34292']

Be sure to first change START_PAGE and END_PAGE to 100 and 350 respecively.

Python requests.get() loop returns nothing

1 Answers1