0

I'd like to extract the text from between the tags with beautiful soup. So far I have:

def table_to_text(html):
    from bs4 import BeautifulSoup
    soup = BeautifulSoup(html)
    trs = soup.findAll('tr')
    for tr in trs:
        print 'row '
        print tr.findAll(['td','th']).text

This gives me output that looks like:

row 
[<td> AAA </td>, <td>Chi</td>, <td></td>, <td class="center"><span class="blue">1353</span>/<span class="red">23</span></td>]/n

I'd like to get the output to look like:

[ AAA , Chi, , 1353, 23]

How can I do this?

user1592380
  • 34,265
  • 92
  • 284
  • 515

1 Answers1

1

.findAll returns a list so you need another for loop like this:

[el.text for el in sp.find_all(['td', 'th']) if el.text]
styvane
  • 59,869
  • 19
  • 150
  • 156