0

I iterate through whole html file, character by character and I want to get html tags.

If I come across '<' it means for me that is start of wanted tag and respectively I consider '>' as its end. Of course, JavaScript scripts can have '<' as well as '>' inside, so my program will treat them as wanted tag when they occur. I want to prevent such a mistakes.

Is there any regex, which I could use or any idea to do this? I tried checking scripts using JavaScript programming language keywords but this method doesn't convince me.

K.Rzepecka
  • 322
  • 2
  • 9
  • 1
    See [RegEx match open tags except XHTML self-contained tags](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). Use a DOM parser. – Wiktor Stribiżew Mar 18 '17 at 23:03
  • 2
    What you are trying to do is very very very difficult since you can find characters `<` and `>` in a javascript code inside: 1) comparisons, 2) strings, 3) comments (inline and multiline), 4) literal regex patterns (good luck to deal with that), 5) bitshift operators, and don't forget eventual css strings and comments. In short you can't deal with that with a simple pattern. – Casimir et Hippolyte Mar 18 '17 at 23:09
  • 2
    There are existing HTML parsers for JS, why are you attempting to write one from scratch? – zzzzBov Mar 18 '17 at 23:10
  • If it's server side, I'd recommend using `cheerio.js` – dgo Mar 19 '17 at 15:03

1 Answers1

0

I'm Sorry I not really catch what you want, But if you want to get any tag or all tag in html with their code you may use
document.querySelectorAll("TagName")[0].outerHTML where the array 0 means it's first founded, use for loop to get all one by one... I'm sorry if it is not what you want...

Hzzkygcs
  • 1,532
  • 2
  • 16
  • 24