RegEx with JavaScript matches more that it should

Question

Fairly simple HTML (the ellipses indicate that there's more code):

...Profile">
 Some text
 </a>...

Using on-line RegEx tester for JavaScript (http://regexpal.com/), I can extract "Some text" (note that it contains newlines) with the following expression:

(?=Profile">)[\s\S]*(?=</a)

(Unfortunately, look-behinds are not supported by JavaScript, and so I also extract Something"> to later remove this. The problem is, however, that the below code

var ShowContent = document.getElementById(id);
ShowContent = ShowContent.innerHTML;
var patt3=/Profile">[\s\S]*(?=<)/;
var GetName=patt3.exec(ShowContent);
alert(GetName);

doesn't extract what the on-line tester shows, but also it includes the whole HTML code that is after "Some text" (IE, not only the ending < /a but also everything after).

Does anyone have any suggestions?

http://stackoverflow.com/questions/10008839/why-use-dom-to-parse-webpages-instead-of-regex — Andreas, Apr 15 '12 at 19:30
*Does anyone has any suggestions?* - Yes. Not using regex to parse HTML would be a pretty good start. — Tomalak, Apr 15 '12 at 19:38
Thanks. Will definitely learn DOM in the future, but I now need a temporary solution. — mrinterested, Apr 15 '12 at 19:50

score 2 · Accepted Answer · answered Apr 15 '12 at 19:38

2

When you're certain that the supplied string does not contain possible pitfalls (eg. <input value='Profile">'>, replace [\s\S]* with [^<]* (anything but a <):

var patt3 = /Profile">([^<]*)/;
var getName = patt3.exec(ShowContent);
getName = getName ? getName[1] : ''; // If no match has been found -> empty string

alert(getName);

(I also replaced GetName with getName, because camelCased variables starting with a capital usually indicate a constructor. Stick to the conventions, and do not start non-constructors with a capital).

answered Apr 15 '12 at 19:38

Rob W

341,306
83
791
678

What's the point in using regex when there *alredy is* a DOM (for free!) that you can use to extract a node's value? – Tomalak Apr 15 '12 at 19:43
1

@Tomalak The question did not include enough information to post an answer regarding DOM traversal. It did, however, contain clear conditions for finding the text. – Rob W Apr 15 '12 at 19:45
@Rob W Thank you so much! Yes, I will use this as a temporary solution, but will spend time to learn DOM for later updates of my home code. – mrinterested Apr 15 '12 at 19:49
Yes, that's right. That would call for some clarification. It can't be difficult to do, though. Looks like the text of a single certain link should be extracted. – Tomalak Apr 15 '12 at 19:50

score 0 · Answer 2 · answered Apr 15 '12 at 22:28

0

You would probably be better off making the quantifier ungreedy. Try this regex:

/Profile">([\s\S]*?)(?=<)/

answered Apr 15 '12 at 22:28

Niet the Dark Absol

320,036
81
464
592

RegEx with JavaScript matches more that it should

2 Answers2