How to scrape data from the AKC Dog Registration site with Beautiful Soup?

Question

I am trying to scrape data from the American Kernel Club (https://www.akc.org/reg/dogreg_stats.cfm) and I have been having some trouble. I am referring to this stackoverflow post and I can get all of the rows on the second table but I cannot format them.

So here is my code.

from bs4 import BeautifulSoup
import requests
url = https://www.akc.org/reg/dogreg_stats.cfm
r. requests.get(r)
data= r.text
soup = BeautifulSoup(data)
rows = soup.find_all('table')[1].find_all('tr')

for row in rows:
    cells = soup.find_all('td')
    firstRanking = cell[1].get_text()
    print(firstRanking)

All it prints out is

More on Registration   Trends:
More on Registration   Trends:
More on Registration   Trends:
More on Registration   Trends:
More on Registration   Trends:
More on Registration   Trends:
More on Registration   Trends:

Instead of the actual rankings.

score 1 · Accepted Answer · edited Sep 25 '14 at 20:17

When you create your variable "cells", you want to be finding all 'td' elements of that ROW, not of the entire "soup" object.

It should look like this:

cells = row.find_all('td')

Also, I believe there is an error in the line after this, it be "cells" not "cell" that is referenced:

firstRanking = cells[1].get_text()

This will make the for loop look like this:

for row in rows:
  cells = row.find_all('td')
  firstRanking = cells[1].get_text()
  print(firstRanking)

score 0 · Answer 2 · answered Sep 26 '14 at 06:26

0

The main mistake I did was on this line rows = soup.find_all('table')[1].find_all('tr') <- this created a list item. To fix the problem I change the line to table= soup.find_all('table')[1] then rows=table.find_all('tr')

answered Sep 26 '14 at 06:26

Zaynaib Giwa

5,366
7
21
26

How to scrape data from the AKC Dog Registration site with Beautiful Soup?

2 Answers2