9

Ok, I think I need to repost my question that was originally:

Javascript Regex group multiple

with a full example. I have:

        var text = ""+ 
            "<html>                           " +
            "  <head>                         " +
            "  </head>                        " +
            "  <body>                         " +
            "    <g:alert content='alert'/>   " +
            "    <g:alert content='poop'/>    " +
            "  </body>                        " +
            "</html>";

        var regex = /<([a-zA-Z]*?):([a-zA-Z]*?)\s([\s\S]*?)>/m;
        var match = regex.exec( text );
        console.log(match)

Output from console.log is:

Output from console.log

The problem is that I am only getting the result for the first ... not the other... what can I do to be able to capture and walk over all stuff that matched?

Community
  • 1
  • 1
mjs
  • 21,431
  • 31
  • 118
  • 200
  • 12
    PS: do not use regex to parse HTML. – m0skit0 Feb 05 '13 at 12:18
  • Do you have a better idea of doing what I am trying to do? That is get the tags of which can really look as – mjs Feb 05 '13 at 12:19
  • What _are_ you trying to do? what results do you need, exactly? – Cerbrus Feb 05 '13 at 12:21
  • 1
    Instead of regex you should use DOM functions to achieve this. – leftclickben Feb 05 '13 at 12:22
  • 2
    Basically, using Regex are wrong for your intentions because you are dealing with nested structures, i.e. recursion. And regular expression is unable to do this. To explain this, You should first understand that a finite automaton (which is the data structure underlying a regular expression) does not have memory apart from the state it's in, and if you have arbitrarily deep nesting, you need an arbitrarily large automaton, which collides with the notion of a finite automaton. – StarPinkER Feb 05 '13 at 12:25
  • That is ridiculous, I see some advises and no reason or explanation provided!!! Explain yourself, don't throw comments. Except previous – kidwon Feb 05 '13 at 12:25
  • @Cerbrus ... I want to get the tags but I am do not think I will end up with a valid HTML structure to use a DOM parser... I am trying to build a template engine ala JSP or GSP style... so regex is fine, since I am planning on providing a mechanism to precompile it all once only – mjs Feb 05 '13 at 12:28
  • [Do not use regex to parse HTML](http://stackoverflow.com/a/1732454/1048572) – Bergi Feb 05 '13 at 12:59
  • Funny post.. but it all depends on what I am trying to do .. nested structured is not really allowed in my case. – mjs Feb 05 '13 at 13:19

2 Answers2

17

exec returns only ONE result at a time and sets the pointer to the end of that match. Therefore, if you want to get ALL matches use a while loop:

while ((match = regex.exec( text )) != null)
{
    console.log(match);
}

To get all matches at one shot, use text.match(regex), in which the regex has g (global flag) specified. The g flag will make match find all matches to the regex in the string and return all the matches in an array.

[edit] and that's why my example HAD a g flag set! [/eoe]

var text = ""+ 
           "<html>                           " +
           "  <head>                         " +
           "  </head>                        " +
           "  <body>                         " +
           "    <g:alert content='alert'/>   " +
           "    <g:alert content='poop'/>    " +
           "  </body>                        " +
           "</html>";

// Note the g flag
var regex = /<([a-zA-Z]*?):([a-zA-Z]*?)\s([\s\S]*?)>/gm;

var match = text.match( regex );
console.log(match);

SIMPLE TEST:

<button onclick="myFunction()">Try it</button>

<script>
function myFunction()
{
var text = ""+ 
           "<html>                           " +
           "  <head>                         " +
           "  </head>                        " +
           "  <body>                         " +
           "    <g:alert content='alert'/>   " +
           "    <g:alert content='poop'/>    " +
           "  </body>                        " +
           "</html>";

// Note the g flag
var regex = /<([a-zA-Z]*?):([a-zA-Z]*?)\s([\s\S]*?)>/gi;

var n = text.match( regex );
alert(n);
}
</script>

working perfectly...

itsid
  • 801
  • 7
  • 16
  • Sounds promising but unfortunately didn't work. I have edited your answer to give you a full example of what I tried... – mjs Feb 05 '13 at 12:33
  • @Hamidam: Of course it fails (goes into inf loop), since the text is matched again and again, and it will always return the first item. You don't understand how `match` is different from `exec`. – nhahtdh Feb 05 '13 at 12:53
  • @nhahtdh you are right.. they are different... it appears as if I need the g instead of the m... I will post a final solution soon.. – mjs Feb 05 '13 at 12:57
  • @Hamidam: I edited his post. `exec` and `match` are different. `String.match` will forget everything the next time you call it. `Regex.exec` will remember the position of last match and will continue from there. – nhahtdh Feb 05 '13 at 12:59
  • @nhahtdh Thanks. Some of us uses firebug to run javascript immediatly rather than create a new page ;) http://i.msdn.microsoft.com/ee819093.image001(en-us,MSDN.10).png – mjs Feb 05 '13 at 13:24
  • no need to create a new page, just copy and paste the code to any online javascript test engine like jsfiddler or the one from w3school: http://www.w3schools.com/jsref/tryit.asp?filename=tryjsref_match_regexp2 – itsid Feb 05 '13 at 14:18
2

This is what works:

           var text = ""+
            "<html>                           " +
            "  <head>                         " +
            "  </head>                        " +
            "  <body>                         " +
            "    <g:alert content='alert'/>   " +
            "    <g:alert content='poop'/>    " +
            "  </body>                        " +
            "</html>";

        var regex = /<([a-zA-Z]*?):([a-zA-Z]*?)\s([\s\S]*?)>/g;
        var match = null;
        while ( (match = regex.exec( text )) != null  )
            console.log(match)

Notice the /g which seems to be neccessary

Adriano Repetti
  • 65,416
  • 20
  • 137
  • 208
mjs
  • 21,431
  • 31
  • 118
  • 200
  • @Adriano, don't censur. I think I am allowed to express myself – mjs Feb 05 '13 at 13:21
  • I guess i'ts more appropriate to post one (or two, three...) long comments to reply to a discussion. People in the future who will have the same problem will find your answer (clean and concise) and if they'll think it's good they'll upvote. – Adriano Repetti Feb 05 '13 at 13:27