1

I want to make a script that will parse the html of the current page, filtering out certain div classes and for now to write their contents to a file or remove everything else but them on the page.

I guess the best way would be to run a Tampermonkey script on that page. I looked on http://userscripts-mirror.org/ but didn't find a script like this.

Is there a javascript html parser that can run on chrome?

Something that could work like this maybe?

 var divClasses = parseCurrentPage("div class x");
 // then do something on divClasses and then show only them
shinzou
  • 5,850
  • 10
  • 60
  • 124
  • 2
    Asking for off-site resources is explicitly off topic. – Charles McKelvey Sep 23 '16 at 19:10
  • What do you mean by "parse the html of the current page"? You should be able to use `.querySelectorAll()` or `.getElementsByClassName()` – guest271314 Sep 23 '16 at 19:10
  • Try pulling the full page source and using regex maybe? – Brydenr Sep 23 '16 at 19:11
  • @Brydenr I hoped to automate the part where I pull the html locally, I also hoped there's already a tool that can filter div classes. – shinzou Sep 23 '16 at 19:13
  • _"I also hoped there's already a tool that can filter div classes."_ Does `document.getElementsByClassName()` not return expected result? – guest271314 Sep 23 '16 at 19:13
  • It may be too heavy, but looking at web browser automation like selenium may be what you want, though I hesitate to endorse a (free) product on stack overflow. – Brydenr Sep 23 '16 at 19:15
  • Have you tried using .html() to pull out a copy of the whole thing, then .remove() or .removeChild() to get rid of the divs? – SilentLupin Sep 23 '16 at 19:15
  • @SilentLupin I'm afraid I'm new to js and even newer to jquery so I'll have to google those.. – shinzou Sep 23 '16 at 19:22
  • @guest271314 looks like `document.getElementsByClassName()` might work. – shinzou Sep 23 '16 at 19:22

3 Answers3

1

filtering out certain div classes

You can use document.getElementsByClassName()

var elements = document.getElementsByClassName(names); // or:
var elements = rootElement.getElementsByClassName(names);
  • elements is a live HTMLCollection of found elements.
  • names is a string representing the list of class names to match; class names are separated by whitespace
  • getElementsByClassName can be called on any element, not only on the document. The element on which it is called will be used as the root of the search.
guest271314
  • 1
  • 15
  • 104
  • 177
1

Jquery can do all of this and more. I would recommend reading up on it https://learn.jquery.com/

Once you have included jquery a simple grab all "div" elements selector would be something like this: var divClasses = $('div'); If you want to only grab certain div elements you can easily do this using selectors, either by adding class, id, and/or parent/hierarchy level restrictions to the selector - read more here https://api.jquery.com/category/selectors/

Then after you did your something you want to do on the div elements you can again use jquery to only show them using jquery's 'append' function.

Simple call the 'append' function with the div element you want to append on the parent html element $('selector-to-grap-div\'s-parent-html').append(myDiv); if you need to grab the parent of one of the div elements then you can use jquery's 'parent' function

splay
  • 327
  • 2
  • 12
  • Nice, can you also work on the contents of `var divClasses` with jquery? something like a `removeif("div has a certain word")` – shinzou Sep 23 '16 at 19:28
  • most definitely! if you are referring to having a certain word in the text of the div then you can simply use jquery's 'text' function - api.jquery.com/text then use a javascript str.indexOf to see if the word you are looking for is contained in the div's text. If it is then you can remove it using jquery's remove api.jquery.com/remove var myDivsText = myDiv.text(); if (myDivsText.indexOf("wordLookingFor") >= 0) { myDiv.remove(); } – splay Sep 23 '16 at 19:50
  • When I try to run `var divClasses = $('div');` in chrome's console, it either returns `undefined` or throws `Tried to get element with id of "%s" but it is not present on the page.` I did try to include it in several ways like here: http://stackoverflow.com/questions/7474354/include-jquery-in-the-javascript-console maybe I'll open another question... – shinzou Sep 23 '16 at 20:02
0

I believe you can make it easy with jQuery... just get the content of <body> and do your queries with jquery.

Mr Lister
  • 45,515
  • 15
  • 108
  • 150
Alexander_F
  • 2,831
  • 3
  • 28
  • 61