How to exclude a tag from the result of find_all in BeautifulSoup

Question

I'm trying Beautiful Soup and using the below code to extract some piece of data.

response = requests.get(url_fii, headers=headers)
response.encoding = 'utf-8'
soup = BeautifulSoup(response.text,'html.parser')

print (soup)

I'm getting the below output:

<table class="holiday_list" width="100%">
<tr>
<th colspan="5" style="text-align:center">Heading line</th>
</tr>
<tr>
<th style="text-align:center">Category</th>
<th style="text-align:center">Date</th>
<th style="text-align:center">Value 1</th>
<th style="text-align:center">Value 2</th>
<th style="text-align:center">Value 3</th>
</tr>
<tr class="alt">
<td class="first"> Quantity</td>
<td class="date">09-Apr-2020</td>
<td class="number">7277.03</td>
<td class="number">5539.41</td>
<td class="number">1737.62</td>
</tr>
</table>

Now, the data of my interest is enclosed by < tr >:

By the below code, I'm able to get everything I want:

for p in soup('tr'):
    print (p.text)

Ouput:

Heading line


Category
Date
Value 1
Value 2
Value 3


 Quantity
09-Apr-2020
7277.03
5539.41
1737.62

The only unwanted part is 'Heading line'. Since this is also enclosed in < tr > therefore it is also coming in the output. However, I notice that it has an extra attribute i.e. 'colspan'. How can I use it as a filter so that 'Heading line' doesn't show in the output.

Wouldn't it be better to change 'tr' to 'td'? `for o in soup('td'):` — r-beginners, Apr 11 '20 at 09:30
@Junitar your suggestion worked. Thanks a lot. It would be better if you can explain it in the answer. — Vishal Sharma, Apr 11 '20 at 09:50
@r-beginners Some of the output I want will be left out if I go with your suggestion. — Vishal Sharma, Apr 11 '20 at 09:52

score 3 · Accepted Answer · answered Apr 11 '20 at 10:16

3

You could skip the first element of your array using a slice notation like so:

for p in soup('tr')[1:]:
    print(p.text)

Please, see this post for more information about the slice notation.

answered Apr 11 '20 at 10:16

Junitar

905
6
13

Please have a look at my code. You might have some comments on that. – Vishal Sharma Apr 11 '20 at 10:32

score 0 · Answer 2 · answered Apr 11 '20 at 10:31

Based on the answer above, here's the code that I'm using now:

response = requests.get(url_fii, headers=headers)
response.encoding = 'utf-8'
soup = BeautifulSoup(response.text,'lxml')

for p in soup('tr')[1:]:
    binNames = p.find_all('th')
    binValues = p.find_all('td')
    nBins = 0
    nValues = 0

    #The below section is for calculating the size of binNames. I didn't know of a better way than this.    
    for i in binNames:
            if len(i) > 0:
                nBins += 1

    #Now we print the binNames
    if nBins > 0:
        for i in range(nBins):
            print(binNames[i].text)

    #The below section is for calculating the size of binValues. I didn't know of a better way than this. 
    for i in binValues:
            if len(i) > 0:
                nValues += 1

    #Now we print the binValues
    if nValues > 0:
        for i in range(nValues):
            print(binValues[i].text)

How to exclude a tag from the result of find_all in BeautifulSoup

2 Answers2