2

I am trying to convert the HTML String Tag into String using Python.

Here is the content I'm trying to convert:

htmltxt = "<b>Hello World</b>".

The result should appear like Hello World in bold. But I'm getting like

<html><body><b>Hello World</b></body></html>

with the below snippet of code

from bs4 import BeautifulSoup
htmltxt = "<b>Hello World</b>"
soup = BeautifulSoup(htmltxt, 'lxml')

Can anyone suggest me how to convert?

Ivar
  • 6,138
  • 12
  • 49
  • 61
user123
  • 21
  • 1
  • This post provides the solution to your question https://stackoverflow.com/questions/328356/extracting-text-from-html-file-using-python – amrit2 Dec 16 '21 at 08:20

2 Answers2

0

In this situation you're trying to find a tag from within your soup object. Given this is the only one and there is no id or class name you can use:

hello_world_tag = soup.find("b")
hello_world_tag_text = hello_world_tag.text
print(hello_world_tag_text) # Output: 'Hello World'

The key here is '.text'. Using beautiful soup to find a specific tag will return that entire tag, but the .text method returns just the text from within that tag.

Edit following comment:

I would still recommend using bs4 to parse html. Once you have your text if you'd like it in bold you may print with:

print('\033[1m' + text)
Sam
  • 773
  • 4
  • 13
  • Can you suggest any other method to convert the characters to bold without using bs4? It means only using the htmltxt = "Hello World". – user123 Dec 16 '21 at 08:40
  • I've just edited to show how to convert to bold. I'd still recommend using bs4 to parse html, the other option could be regex however. – Sam Dec 16 '21 at 08:51
0

Note You won't get out a bold string per se, it is something that always have to be done by interpreting or formating.

Extracting text from HTML string with BeautifulSoup you can call the methods text or get_text():

from bs4 import BeautifulSoup
htmltxt = "<b>Hello World</b>"
soup = BeautifulSoup(htmltxt, 'lxml')

soup.text
HedgeHog
  • 22,146
  • 4
  • 14
  • 36
  • Thanks, I got you. Can you suggest any other method to convert the characters to bold without using bs4? It means only using the htmltxt = "Hello World" – user123 Dec 16 '21 at 08:41
  • It depends on what you want to do - just print, then format `print(f'\033 {soup.text}')` or output to HTML, then no changes, or output to Markdown, ..... Please add the detail of what you want to do to your question to get more targeted answers. Would be great, thanks – HedgeHog Dec 16 '21 at 09:47