JavaScript regex matching an attribute in tag if another specific attribute is present

Question

I have a string pattern:

<div content="[...]" class="[...]">[...]</div>
<div content="website" [...] class="_type">[...]</div>
<dic content="[...]" class="[...]">[...]</div>

My question is how I can get the "website" text using code here.

I have tried:

/content="(.+?)".*?class="_type"/g

But the result is not expected: [...].

[Don't parse HTML with regexes](http://stackoverflow.com/a/1732454/1835379). — Cerbrus, Jul 10 '15 at 10:01
Cannot agree more with @Cerbrus. JQuery / JavaScript is your friend. — SDekov, Jul 10 '15 at 10:02
You should avoid regular expressions for HTML. You could probably use `.getAttribute("content")`. — npinti, Jul 10 '15 at 10:04
@npinti `getAttribute` isn't going to work if it's a string. ;) — James Donnelly, Jul 10 '15 at 10:05
Oh no, this is textstring i want regex in meanio, but jquery and html — user3129040, Jul 10 '15 at 10:08
@JamesDonnelly: I'm not a JS guy, but doing this: `document.getElementsByClassName("_type")[0].getAttribute("content");` yielded `website`. — npinti, Jul 10 '15 at 10:13
@npinti the question states "I have a string paterm". Sounds like OP has a string containing those 3 `div` elements, but they don't actually exist on the page `"
...
..."`. — James Donnelly, Jul 10 '15 at 10:16
thanks sir for reply, but i want use regex javascript for this. You have any idea. — user3129040, Jul 10 '15 at 10:16
@James Donnelly: Thanks sir, But my languge english is bad, i want say more more... but i can't... i just have example and code :) — user3129040, Jul 10 '15 at 10:18

Wiktor Stribiżew · Accepted Answer · 2015-07-10T14:13:41.307

1

Here is a regex that can get that substring.

var re = /<(?=[^<>]*\bclass="_type")div\b[^<>]*content="([^"]*)"/ig;

The regex is matching any <div> containing string that has content=" and also containing class="_type". The result is stored in the captured group 1. Note that class="_type" can be both before or after content="{our string}".

The code can be something like:

var re = /<(?=[^<>]*\bclass="_type")div\b[^<>]*content="([^"]*)"/ig; 
var str = '<div content="[...]" class="[...]">[...]</div>\n<div content="website" [...] class="_type">[...]</div>\n<dic content="[...]" class="[...]">[...]</div>';
var m;
 
while ((m = re.exec(str)) !== null) {
    if (m.index === re.lastIndex) {
        re.lastIndex++;
    }
    document.getElementById("r").innerHTML += m[1] + "<br/>";
}

<div id="r"/>

In case you do not know what kind of delimiters there will be in HTML, it makes it a bit more problematic. However, it is still possible:

var re = /<(?=[^<>]*\bclass=['"]?_type\b['"]?)div\b[^<>]*content=(?:["']([^<]*?)["']|(\S+))/ig; 
var str = '<div content="[...]" class="[...]">[...]</div>\n<div content=\'[...]\' class=\'[...]\'>[...]</div>\n<div content="web site" [...] class="_type">[...]</div>\n<dic content="[...]" class="[...]">[...]</div>\n<dic content=[...] class=[...]>[...]</div>\n<dic content=\'[...]\' class=\'[...]\'>[...]</div>\n<div content=\'web site\' [...] class=\'_type\'>[...]</div>\n<div content=website [...] class=_type>[...]</div>';
var m;
 
while ((m = re.exec(str)) !== null) {
    if (m.index === re.lastIndex) {
        re.lastIndex++;
    }
    if (m[1] === undefined) {
      document.getElementById("e").innerHTML += m[2] + "<br/>";
    }
  else {
      document.getElementById("e").innerHTML += m[1] + "<br/>";
    }
    
}

<div id="e"/>

edited Jul 10 '15 at 14:13

answered Jul 10 '15 at 10:51

Wiktor Stribiżew

607,720
39
448
563

So, what if the target `div` also has another class on it? Like `class="_type someClass"`? – Cerbrus Jul 10 '15 at 11:11
@Cerbrus: It won't match it. The lookahead checks for the existence of `class="_type"` exactly. – Wiktor Stribiżew Jul 10 '15 at 11:26
1

Which is exactly what I meant. This is why one doesn't process HTML with regexes. There's always something that'll break the regex. – Cerbrus Jul 10 '15 at 11:27
@Cerbrus: I do not understand what you mean at all. The requirement is so clear and simple that even a regex is safe to use here. We only match the "content" if there is `class="_type"`. – Wiktor Stribiżew Jul 10 '15 at 11:28
Assuming no other classes are ever found in the `_type` elements – Cerbrus Jul 10 '15 at 11:30
All i want say now is thanks @stribizhev, code runing good althought class="_type" before or after content="website".. Very good, and althought i see code i not undestand :) – user3129040 Jul 10 '15 at 13:20
1

I will explain in a little while, sorry, too busy right now. – Wiktor Stribiżew Jul 10 '15 at 13:24
@stribizhev i want to say 1 sub question is if `str =
' and ' -> "
– user3129040 Jul 10 '15 at 13:25
1

I added a solution to all possible cases. – Wiktor Stribiżew Jul 10 '15 at 14:13

JavaScript regex matching an attribute in tag if another specific attribute is present

1 Answers1