1

Does JSSoup (which itself states "JavaScript + BeautifulSoup = JSSoup") support a select() operation similar to Beautiful Soup or JSoup to select elements based on a CSS selector?

I did not find it, does it probably exist with a different name?

Markus Weninger
  • 11,931
  • 7
  • 64
  • 137
  • Maybe, since you are using native js, you could use `querySelector` and `querySelectorAll`? – Mr. Polywhirl Dec 29 '20 at 11:33
  • @Mr.Polywhirl I have a string that contains HTML which I want to analyze, not the page's DOM itself. – Markus Weninger Dec 29 '20 at 11:35
  • **Note:** _[Naming Style](https://github.com/chishui/JSSoup#naming-style): JSSoup tries to use the same interfaces as BeautifulSoup so BeautifulSoup user can use JSSoup seamlessly. However, JSSoup uses Javascript's camelCase naming style instead of Python's underscore naming style. Such as `find_all()` in BeautifulSoup is replaced as `findAll()`._ – Mr. Polywhirl Dec 29 '20 at 11:36
  • Now JSSoup supports select() for CSS selector. – chishui Oct 11 '21 at 04:26

3 Answers3

2

You will not be able to utilize selector querying similar to querySelector and querySelectorAll.

Here is the findAll definition in JSsoup:

{
  key: 'findAll',
  value: function findAll() {
    var name = arguments.length > 0 && arguments[0] !== undefined ? arguments[0] : undefined;
    var attrs = arguments.length > 1 && arguments[1] !== undefined ? arguments[1] : undefined;
    var string = arguments.length > 2 && arguments[2] !== undefined ? arguments[2] : undefined;
    // ...
    var strainer = new SoupStrainer(name, attrs, string);
    // ...
  }
}

And here is the SoupStrainer constructor:

function SoupStrainer(name, attrs, string) {
  _classCallCheck(this, SoupStrainer);

  if (typeof attrs == 'string') {
    attrs = { class: [attrs] };
  } else if (Array.isArray(attrs)) {
    attrs = { class: attrs };
  } else if (attrs && attrs.class && typeof attrs.class == 'string') {
    attrs.class = [attrs.class];
  }
  if (attrs && attrs.class) {
    for (var i = 0; i < attrs.class.length; ++i) {
      attrs.class[i] = attrs.class[i].trim();
    }
  }
  this.name = name;
  this.attrs = attrs;
  this.string = string;
  }

You are required to pass a tag name as the first argument, followed by attributes. A string is treated as a class name.

Example usage

const JSSoup = require('jssoup').default;

const html = `
<html>
  <head>
    <title>Hello World</title>
  </head>
  <body>
    <h1>Hello World</h1>
    <p class="foo">First</p>
    <p class="foo bar">Second</p>
    <div class="foo">Third</div>
  </body>
</html>
`;

const printTags = (tags) => console.log(tags.map(t => t.toString()).join(' '));

const soup = new JSSoup(html);

printTags(soup.findAll('p', 'foo'));
// <p class="foo">First</p> <p class="foo">Second</p>

printTags(soup.findAll('p', { class: 'foo' }));
// <p class="foo">First</p> <p class="foo">Second</p>

printTags(soup.findAll('p', { class: 'foo' }, 'Second'));
// <p class="foo">Second</p>

printTags(soup.findAll('p', { class: ['foo', 'bar'] }));
// <p class="foo">Second</p>

printTags(soup.findAll(null, 'bar'));
// <p class="foo bar">Second</p> <div class="foo">Third</div>
Mr. Polywhirl
  • 42,981
  • 12
  • 84
  • 132
1

From the documentation, it appears to be called find or findAll depending on whether you want to find one or many. Here's an example they give:

var data = `
<div>
  <p> hello </p>
  <p> world </p>
</div>
`
var soup = new JSSoup(data);
soup.find('p')
// <p> hello </p>

Looking at the source, I don't see anything offering CSS selector functionality, but it did show that find and findAll accept more than one argument, and an example in the documentation for BeautifulSoup shows using the second argument to filter by class, e.g.:

const JSSoup = require('jssoup').default;
const data = `
<div>
    <p class="foo bar"> hello </p>
    <p> world </p>
</div>
`
const soup = new JSSoup(data);
console.log(soup.find('p', 'foo').toString()); // Logs: <p class="foo bar">hello</p>

The second argument can be used for other attributes as well, but CSS selectors don't seem to be an option.

You have other options, such as jsdom, which has all the usual DOM stuff such as querySelector and querySelectorAll:

const { JSDOM } = require("jsdom");
const data = `
<div>
    <p class="foo bar"> hello </p>
    <p> world </p>
</div>
`;
const dom = new JSDOM(data);
const doc = dom.window.document;
console.log(doc.querySelector(".foo").outerHTML); // Logs: <p class="foo bar"> hello </p>
T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
  • This seems to work only with tag names, but for example not with CSS classes. For example `mySoup.findAll('.myClass')` does return an empty array, even though I have elements with the class `myClass`. – Markus Weninger Dec 29 '20 at 11:36
  • 1
    @MarkusWeninger - Looks like it's *slightly* more than just tag name, but yeah, not full CSS selectors. I wouldn't like to use this API. :-) I'd probably go for [jsdom](https://www.npmjs.com/package/jsdom) or similar, which has `querySelector` and `querySelectorAll`, etc. – T.J. Crowder Dec 29 '20 at 11:53
  • Thank you for the hint with _jsdom_! – Markus Weninger Dec 29 '20 at 12:09
1

Based on the already given answers, I just want to add: One can also just search by class name (without a tag name) by setting the tag name to undefined in find() and findAll():

mySoup.findAll(undefined, 'myClass');
Markus Weninger
  • 11,931
  • 7
  • 64
  • 137