-1

This is a complete html I work with.

This is simplified version of above HTML:

<table class="premium">
    <tr class="retailer top-offer" data-pricer="47.84" saler-id="123">...</td>
    <tr class="retailer" data-pricer="57.11" saler-id="234">...</td>
</table>
<table class="basic-supp">
    <tr class="retailer top-offer" data-pricer="41.87" saler-id="456">...</td>
    <tr class="retailer" data-pricer="58.12" saler-id="567">...</td>
</table>

From TABLE with class="basic-supp" from TR tags and from data-pricer="..." attributes I need to extract values.

I tried this method on simplified html:

from bs4 import BeautifulSoup
with open('file.html', 'r') as f:
    contents = f.read()
soup = BeautifulSoup(contents, 'lxml')
tags = soup.find_all('tr')
for tag in tags:
    print(tag.attrs['data-pricer'])

> 47.84
> 57.11
> 41.87
> 58.12

This is almost what I need, except the fact it takes values from both tables instead the table with class="basic-supp". Any idea how to fix it?

And the main problem is it doesn't work at all on complete html I posted above. The error:

    print(tag.attrs['data-pricer'])
KeyError: 'data-pricer'

Can somebody give me advice please?

Thank you for your time!

P.S. This is not even close duplicate of post Extracting an attribute value with beautifulsoup

neznajut
  • 87
  • 3
  • 8

2 Answers2

1

First find <tr> then with tr['data-pricer'] get what you want.

Try this:

from bs4 import BeautifulSoup

html = '''
<table class="premium">
    <tr class="retailer top-offer" data-pricer="47.84" saler-id="123">...</td>
    <tr class="retailer" data-pricer="57.11" saler-id="234">...</td>
</table>
<table class="basic-supp">
    <tr class="retailer top-offer" data-pricer="41.87" saler-id="456">...</td>
    <tr class="retailer" data-pricer="58.12" saler-id="567">...</td>
</table>
'''

soup = BeautifulSoup(html , 'html.parser')
for table in soup.find_all("table", {"class": "basic-supp"}):
    for tr in table.find_all('tr'):
        print(tr['data-pricer'])
I'mahdi
  • 23,382
  • 5
  • 22
  • 30
  • I am not sure what is difference. This gives me values from both tables again. – neznajut Sep 12 '21 at 11:23
  • @neznajut edited my answer, is this correct? – I'mahdi Sep 12 '21 at 11:31
  • This is nice! Thank you! But this is half solution. Tt still don't work on [complete html I work with](https://pastebin.com/yyMjmCYj). Maybe you can see where is the problem. – neznajut Sep 12 '21 at 11:49
  • On complete version of my html are little bit different names of classes. Instead class="supplier" is class=". . . table-basic-supplier". Instead class="data-pricer" is class="data-priceek". I am trying to edit the code you offered, but it won't work. `soup = BeautifulSoup(html , 'html.parser') for table in soup.find_all("table", {"class": "table-basic-supplier"}): for tr in table.find_all('tr'): print(tr['data-priceek'])` – neznajut Sep 12 '21 at 12:02
1

It's easier to just use css selectors:

data = []
for tr in soup.select('table.basic-supp tr'):
    data.append([tr['data-pricer'],tr['saler-id'] ])
print(data)

Or, if you want to use extreme list comprehensions, a one liner:

[[tr['data-pricer'],tr['saler-id']] for tr in soup.select('table.basic-supp tr')]

In either case, the output should be:

[['41.87', '456'], ['58.12', '567']]
Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45
  • It still doesn't work on complete html code I have. https://pastebin.com/yyMjmCYj There are bit different names of classes. Instead class="supplier" is class=". . . table-basic-supplier". Instead class="data-pricer" is class="data-priceek". Tried to change your code consider new classes names but it doesn't work. – neznajut Sep 12 '21 at 12:08