1

The variable htmlStr may contain different spellings of id:

var htmlStr = "<div id="demo_div"></div>";

var htmlStr = "<div id='demo_div'></div>";

var htmlStr = "<div id=demo_div class="demo"></div>";

var htmlStr = "<div id=demo_div></div>";

How can I write this differently without many try-catch functions? Can I combine the patterns? It works - but does not look pretty.

var idname;
try {
    idname = /(id="(.*?)(\"))/g.exec(htmlStr)[2]
} catch (e) {
    try {
        idname = /(id='(.*?)(\'))/g.exec(htmlStr)[2]
    } catch (e) {
        try {
            idname = /(id=(.*?)(\ ))/g.exec(htmlStr)[2]
        } catch (e) {
            try {
                idname = /(id=(.*?)(\>))/g.exec(htmlStr)[2]
            } catch (e) {
                console.log(e);
            }
        }
    }
}

console.log(idname);
still
  • 73
  • 1
  • 5
  • 1
    `exec` doesn't throw an error if no match is found – Luca Kiebel Sep 26 '18 at 10:30
  • Try `(id=['"]?(.*?)["'> ])` – Nambi_0915 Sep 26 '18 at 10:33
  • 1
    You need something like `/id=(?:(["'])([^'"]*)\1|([^\s>]*))/g`, loop over all matches by calling `exec` until no match, and only grab either Group 2 or Group 3 (if Group 3 matched). But it is safer to use a DOM parser to parse HTML. – Wiktor Stribiżew Sep 26 '18 at 10:40
  • 1
    ...how about NOT using regex. Take the HTML string, make it into an actual element, check its ID. https://stackoverflow.com/questions/2522422/converting-a-javascript-string-to-a-html-object – VLAZ Sep 26 '18 at 10:42

2 Answers2

1

You can do this without using regex by simply parsing the HTML.

const htmlStrings = [
  '<div id="demo_div"></div>',
  "<div id='demo_div'></div>",
  "<div id=demo_div class='demo'></div>",
  '<div data-id="not_a_real_id"></div>', //note: doesn't have an ID 
  "<div data-id=not_an_id ID= demo_div></div>", 
  "<div id= demo_div><span id=inner_id></span></div>"
];

function getId(html) {
  const parser = document.createElement('div');
  parser.innerHTML = html;
  
  return parser.firstChild.id;
}

htmlStrings.forEach(x => console.log(getId(x)));

As you can see, you can create an element, put the HTML in it, then grab the first child and check it's ID. It works even if you have another type of attribute like a custom attribute called data-id or if the ID has any kind of capitalisation or even if that div has inner elements or anything else.

This technique won't work with invalid HTML or if you have multiple elements you want the ID of but this is simply to demonstrate it. Once it's parsed into a proper element, you can traverse its hierarchy as you see fit and perform any sort of extraction you need.

VLAZ
  • 26,331
  • 9
  • 49
  • 67
  • Thank you, that's the safest option for me and fits, because I need a valid HTML anyway. – still Sep 26 '18 at 14:12
0
/id=["']?([^\s"'>]+)/g

This will match all four examples.

enter image description here

enter image description here

Lipsum
  • 536
  • 3
  • 6
  • With /\sid=["']?([^\s"'>]+)/g also the other cases work. Use the form anyway with a try-catch, because it can also happen that there is no ID and I want to catch this with an error. – still Sep 26 '18 at 16:15