1

This is HTML source:

<td style="padding-right: 10px;" valign="top">1.1</td>
<td valign="top">
         If applicable, do <a href="url"> link </a> to switch one to other mode.<br/>
</td>

From above, how can I extract only strings? I tried it like below. Although first one work, second one doesn't work.

print(soup.find_all("td")[0].string)
print(soup.find_all("td")[1].string)
1.1
None
Shockyn
  • 11
  • 2
  • Welcome to StackOverflow. What is the output that you would want from a working version of the code? – Caridorc Feb 08 '23 at 00:51

1 Answers1

0

You should use the .text attribute rather then the .string attribute. The reason is quite complex and is explained here: Difference between .string and .text BeautifulSoup

Here is the working version of your code:

import bs4
source="""
<td style="padding-right: 10px;" valign="top">1.1</td>
<td valign="top">
         If applicable, do <a href="url"> link </a> to switch one to other mode.<br/>
</td>
"""

soup = bs4.BeautifulSoup(source, "html.parser")
print(soup.find_all("td")[0].text)
print(soup.find_all("td")[1].text)

With output:

1.1

         If applicable, do  link  to switch one to other mode.

Feel free to use .strip() to remove unwanted spaces at the start and at the end.

Caridorc
  • 6,222
  • 2
  • 31
  • 46