3

I have the following javascript code:

<script type="text/javascript"> //javascript starts

 var patt=/[<](\S+).*>(.*)<\/\1>/;
 var str='<a id="test">hi</a> <p></p>';

 alert(str.match(patt));
 alert(patt.exec(str));

</script>

It is expected to find all tags in a html document. So ideally it should return <a id="test">hi</a>, <p></p>.

But it currently returns <a id="test">hi</a>, a ,hi.

Why is this happening?

Also another question, What is the difference between str.match(patt) and patt.exec(str) and which is better to use?

gopi1410
  • 6,567
  • 9
  • 41
  • 75
  • 2
    why don't u use a dom parser instead of regex? – gion_13 Jun 04 '12 at 16:28
  • 1
    well, jquery is a veery good dom parser besides other things – gion_13 Jun 04 '12 at 16:29
  • here's a more detailed answer on why not to use regex to parse html : http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags . Besides, with jQuery, all you have to do to select all the `a` elements in the document is : `$('a');` – gion_13 Jun 04 '12 at 16:31
  • 1
    But what I need here is an html parser not a dom parser. I am reading html input from a file –  Jun 04 '12 at 16:31
  • 2
    dom stands for "document object model", which is the browser's internal representation of the xml/html string code. jQuery handles that too : `$('your valid html string here...').find('a');` – gion_13 Jun 04 '12 at 16:33
  • 1
    Using jQuery, your query would simply be `$(htmlString).find("*");` – RustyTheBoyRobot Jun 04 '12 at 16:38

3 Answers3

2
var patt=/[<](\S+).*>(.*)<\/\1>/g;

try to specify the global modifier (or it will stop at the first occurrence found).

About your second question MDN is a good resource:
From https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/String/match

If the regular expression does not include the g flag, returns the same result as regexp.exec(string). If the regular expression includes the g flag, the method returns an Array containing all matches. If there were no matches, the method returns null.

Fabrizio Calderan
  • 120,726
  • 26
  • 164
  • 177
1

You need to append the global modifier, g, to your regex: /[<](\S+).*>(.*)<\/\1>/g.

  • If you don't use the g global modifier, match and exec will return an array containing the whole first match in the string as the first element, followed by any parenthesised match patterns within the match as subsequent array elements.

  • If you do use the g modifier, match and exec will get all the matches from the string. match returns them as an array, and exec will return one array for each match (with match patterns, as it does without g) but multiple calls to exec will each return a different match untill all matches have been reported (see below for an exmple).

In general, I would recommend match over exec, because exec relies on the regular expression maintaining state (specifically, lastIndex, the index of the string where the match should resume). I find this to be detrimental if you want to use a regular expression on multiple strings:

var reg = /\w/g;
reg.exec("foo"); // ["f"]
reg.exec("foo"); // ["o"]
reg.exec("bar"); // ["r"] -- does not start at the beginning of the string

Compare that to the match behavior:

var reg = /\w/g;
"foo".match(reg); // ["f", "o", "o"]
"bar".match(reg); // ["b", "a", "r"]
// we can now use the arrays to get individual matches

However, if you need the get parenthesised match patterns for each match in a global search, you must use exec, since global application of match only gets a list of whole matches, not match patterns with those matches.

// the ending digit is a match pattern
var reg = /\w(\d)/g;

// match only gets list of whole matches
"d1b4h7".match(reg); // ["d1","b4","h7"] 

// exec gets the match and the match pattern
reg.exec("d1b5h7"); // ["d1","1"]
reg.exec("d1b5h7"); // ["b4","4"]
reg.exec("d1b5h7"); // ["h7","7"]

In conclusion, it sounds like you want to use match with a global modifier, since you don't need match pattern information. If you do actually need match pattern information, get all matches by using a loop to repeatedly call exec, until exec returns null instead of an array.

apsillers
  • 112,806
  • 17
  • 235
  • 239
0

Try this:

var patt=/<\S+[^>]*>[^<]*<\/\S+>/g;

The extra a and hi are there because you had them as capture groups. This one will only return tags. The one flaw it has is it will match <begin>dfgdf</diffEnd>.

Knyri
  • 2,968
  • 1
  • 17
  • 24
  • Also: http://www.perlfect.com/articles/regextutor.shtml is a great resource for writing regex pieces. – Knyri Jun 04 '12 at 16:56