0

I'm making a bookmarklet for a website that will need to parse multiple pages. I tried DOMParser, but it gives an error with the xml option and returns null with html. I tried jQuery, but I'm sure that's using DOMParser somewhere along the way. It does work correctly with PHP, but I'd rather not have to make twice as many requests to webpages.

I'm looking for a standalone javascript plugin to parse xml or html.

Thanks!

mowwwalker
  • 16,634
  • 25
  • 104
  • 157
  • 1
    See - http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – ChrisF Feb 22 '12 at 23:07
  • can you post your code for using DOMParser? – pyramation Feb 22 '12 at 23:22
  • 1
    @ChrisF, I'm not looking for Regex, just a parser. I haven't the slightest clue what happens behind the scenes in PHP or Javascript's built-in parsers, but are you saying it's not possible without regex, and not practical with it? – mowwwalker Feb 23 '12 at 00:01

1 Answers1

1

Can you not just "parse" the HTML by using the DOM?

If you need to do multiple pages from the same current page, load the other pages in an iFrame and them access the DOM like document.frame[0].contentWindow.document

EDIT: If you wish to avoid loading the external files in other pages, and also executing their script, then use Ajax (XMLHttpRequest) to get the each page. For each page use code like var newdiv = document.createElement('script'); newdiv.innerHTML = ajaxcontent; and then use the DOM to read content from newdiv. If you don't append the newdiv to the page, this should be just as lightweight as using DOMParser.

DG.
  • 3,417
  • 2
  • 23
  • 28
  • No, this needs to be done for multiple pages and would be very resource heavy. – mowwwalker Feb 25 '12 at 19:01
  • Please elaborate on what it is about my suggested solution makes it less appropriate than your solution? The only downside I see over your suggested solution is that loading the other pages will load external files in script. In that case, see my updated answer after "EDIT". – DG. Feb 26 '12 at 02:03
  • ... Loading the content into the DOM would mean loading all the pictures, all the media, etc. – mowwwalker Feb 26 '12 at 06:57
  • Ah. Seems you are correct. I thought that if you did not append newdiv to the page, the images and other external scripts would not be loaded. You could use regex to strip out all `src` attributes before inserting the content into newdiv. A bit messy, but should work, and keeps things fast. Most simple way might be something like `ajaxcontent.replace(/ src\=/gi, ' xsrc=')` – DG. Feb 26 '12 at 10:18