Getting text from table row with beautiful soup

Question

I'd like to extract the text from between the tags with beautiful soup. So far I have:

def table_to_text(html):
    from bs4 import BeautifulSoup
    soup = BeautifulSoup(html)
    trs = soup.findAll('tr')
    for tr in trs:
        print 'row '
        print tr.findAll(['td','th']).text

This gives me output that looks like:

row 
[<td> AAA </td>, <td>Chi</td>, <td></td>, <td class="center"><span class="blue">1353</span>/<span class="red">23</span></td>]/n

I'd like to get the output to look like:

[ AAA , Chi, , 1353, 23]

How can I do this?

score 1 · Accepted Answer · answered Aug 13 '15 at 21:53

1

.findAll returns a list so you need another for loop like this:

[el.text for el in sp.find_all(['td', 'th']) if el.text]

answered Aug 13 '15 at 21:53

styvane

59,869
19
150
156

Thank you very much that works! Can I ask what is the difference between findAll and find_all? – user1592380 Aug 14 '15 at 19:19

Getting text from table row with beautiful soup

1 Answers1