Locating table with no id or class attributes

Question

I am trying to scrape a site with a few tables. Neither table has a class or an id and the site really doesn't use either one so I am not sure if there is a way for me to get the data. Here is the link to the site - I would post the html but it would be too long.

http://epi.hbsna.com/products/dept.asp?msi=0&sid=6076533CE8C648AE9883BDDBED795B29&dept_id=315&parent_id=0

The table I am trying to extract begins on line 310.

score 9 · Accepted Answer · answered Mar 02 '16 at 03:20

Since this is BeautifulSoup specific question, here is a working BeautifulSoup specific solution. The idea is to find the element having the SKU# text and locate the first table parent:

import requests
from bs4 import BeautifulSoup


data = requests.get('http://epi.hbsna.com/products/dept.asp?msi=0&sid=6076533CE8C648AE9883BDDBED795B29&dept_id=315&parent_id=0').content
soup = BeautifulSoup(data, "html.parser")

table = soup.find(text="SKU#").find_parent("table")
for row in table.find_all("tr")[1:]:
    print([cell.get_text(strip=True) for cell in row.find_all("td")])

Prints the contents of the table:

['40010001', 'ABA Service Kit', '-', '1-1/4" 10', 'None', '5-1/2"', '0.63', 'Clamp', '42710566']
['40010002', 'ABA Service Kit', '-', '1-1/4" 10', '5/8" RH', '5-1/2"', '0.63', 'Clamp', '42710566']
...
['40010649', 'ABA Service Kit', '-', '1 1/2 - 10', '1.5', '6"', '0.50', 'Strap', '427-10517']
['40050604', 'ABA Service Kit', 'none', '1 1/2" - 10"', '1 1/2" LH', '6"', '0.50', 'Strap', '427-10601']

Thank you - that looks perfect - does this code work for 3.5 - I am getting some errors — PatrickP76, Mar 02 '16 at 03:42
@PatrickP76 yeah, tested on 3.5. What error do you get? Thanks. — alecxe, Mar 02 '16 at 03:43
Do not worry - I was able to figure it out - you are the best - I just had to change the request to the 3.5 version — PatrickP76, Mar 02 '16 at 03:46

score 2 · Answer 2 · edited May 23 '17 at 12:08

2

How do you feel about using this xpath expression?

//*[./text()="SKU#"]/ancestor::table[1]

It means, "find the first element with text being exactly SKU#, then select its closest table ancestor."

You can try it out in a browser inspector by passing the expression as a string to the $x function.

See this answer for working with xpath in beautifulsoup.

edited May 23 '17 at 12:08

Community

1
1

answered Mar 02 '16 at 02:51

allonhadaya

1,297
7
19

If there's a risk that SKU# will appear elsewhere in the document, you can choose any other bit of text that will always appear only in the table. – allonhadaya Mar 02 '16 at 02:51
I am new and haven't tried or even heard of xpath - I will research and hopefully that will do it. Thank you. – PatrickP76 Mar 02 '16 at 03:07
@alecxe's answer is perfect for using just `beautifulsoup`, and it reads very clearly! `xpath` is a language for navigating xml documents that has implementations in most programming languages. It's worth checking out as a part of your web scraping toolkit. :) – allonhadaya Mar 02 '16 at 04:21

Locating table with no id or class attributes

2 Answers2