Does JSSoup support extracting text?

Question

Does JSSoup support extracting text similar to Beautiful Soup soup.findAll(text=True)?

The documentation does not provide any information about this use case, but seems to me that there should be a way.

To clarify what I want is to grab all visible text from the page.

HedgeHog · Accepted Answer · 2021-11-24T18:43:08.563

0

In beautiful soup you can extract text in different ways with find_all(text=True) but also with .get_text() or .text.

JSSoup works similar to beautiful soup - To extract all visible text just call .get_text(), .text or string on your soup.

Example (jssoup)

var soup = new JSSoup('<html><head><body>text<p>ptext</p></body></head></html>');
soup.get_text('|')
// 'text|ptext'

soup.get_text('|').split('|')
// ['text','ptext']

Example (beautiful soup)

from bs4 import BeautifulSoup
html = '''<html><head><body>text<p>ptext</p></body></head></html>'''

soup = BeautifulSoup(html, "html.parser") 
print(soup.get_text('|').split('|'))

Output

['text','ptext']

edited Nov 24 '21 at 18:43

answered Nov 21 '21 at 07:55

HedgeHog

22,146
4
14
36

Thanks @HedgeHog, however soup.findAll returns an array, which makes further processing easier for me. – Miki Nov 24 '21 at 17:41
Okay, an array is just a step away `soup.get_text('|').split('|')` - This should convert your text into an array. – HedgeHog Nov 24 '21 at 18:22
Sounds about right. Syntax in JSSoup is slightly different though:soup.getText('|').split('|'), if you post this as a response I will gladly accept it. – Miki Nov 24 '21 at 18:31
Sounds fair, edited the answer added it as example. – HedgeHog Nov 24 '21 at 18:48

Does JSSoup support extracting text?

1 Answers1

Example (jssoup)

Example (beautiful soup)

Output