Most efficient way of finding a specific text out of string in JS?

Question

I'm looking for the most efficient way of finding and returning a specific text out of a big massive string in JS.

The rule of the specific text is a text that starts with "ID_" and ends with ".pdf".

Assuming I'm having such a string (short tiny version of it):

<ul>
<li><a href="/questions/237104/ID_2556.pdf">Click here to
download.</a></li>
<li><a href="/questions/237104/ID_37.pdf">Click
here to download.</a></li>
<li><a
href="/questions/237104/ID_29997.pdf">Click here to download.</a></li>
<li><a href="/questions/237104/ID_0554.pdf">Click here to
download.</a></li>
</ul>

The script should return these separate values as strings:

ID_2556.pdf

ID_37.pdf

ID_29997.pdf

ID_0554.pdf

OK, what have you tried and why did it not work for you? And what is with the `json` tag? There doesn't seem to be any JSON here — VLAZ, Nov 17 '18 at 23:31
How exactly do you define "most efficient / lightest"? What have you tried and have you measured it against your metric? — Ingo Bürk, Nov 17 '18 at 23:31
Do you actually have a Javascript string of HTML? Or do you have that HTML content in a page? — jfriend00, Nov 17 '18 at 23:31
@GeorgeJempty why did you remove the `node.js` tag? It seems relevant here — VLAZ, Nov 17 '18 at 23:41

score 2 · Accepted Answer · answered Nov 17 '18 at 23:55

You can get all matching strings with String.prototype.match:

var html = `
<ul>
<li><a href="/questions/237104/ID_2556.pdf">Click here to
download.</a></li>
<li><a href="/questions/237104/ID_37.pdf">Click
here to download.</a></li>
<li><a
href="/questions/237104/ID_29997.pdf">Click here to download.</a></li>
<li><a href="/questions/237104/ID_0554.pdf">Click here to
download.</a></li>
</ul>
`;

console.log(html.match(/ID_.*?pdf/g))

score 1 · Answer 2 · answered Nov 17 '18 at 23:32

You might want to use regex for this task /ID_.*?\.pdf/gm:

Here is a playground: https://regex101.com/r/mD5Yt3/1

It will generate code for you:

const regex = /ID_.*?\.pdf/gm;
const str = `<ul>
<li><a href="/questions/237104/ID_2556.pdf"><a href="/questions/237104/ID_2556.pdf">Click here to
download.</a></li>
<li><a href="/questions/237104/ID_37.pdf">Click
here to download.</a></li>
<li><a
href="/questions/237104/ID_29997.pdf">Click here to download.</a></li>
<li><a href="/questions/237104/ID_0554.pdf">Click here to
download.</a></li>
</ul>`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }

    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

score 0 · Answer 3 · answered Nov 17 '18 at 23:32

One option would be to use DOMParser to turn the HTML string into a document, then select as which end in .pdf, figure out which ones fulfill the desired format, and push them to an array:

const htmlStr = `<ul>
<li><a href="/questions/237104/ID_2556.pdf">Click here to
download.</a></li>
<li><a href="/questions/237104/ID_37.pdf">Click
here to download.</a></li>
<li><a
href="/questions/237104/ID_29997.pdf">Click here to download.</a></li>
<li><a href="/questions/237104/ID_0554.pdf">Click here to
download.</a></li>
</ul>`;

const doc = new DOMParser().parseFromString(htmlStr, 'text/html');
const filenames = [...doc.querySelectorAll('a[href$=".pdf"]')]
  .reduce((filenames, { href }) => {
    const match = href.match(/ID_\d+\.pdf/);
    if (match) filenames.push(match[0]);
    return filenames;
  }, []);
console.log(filenames);

You could also do all the filtering inside the reduce, rather than in the selector string, if you wanted to reduce the code, might be a bit less efficient though:

const filenames = [...doc.querySelectorAll('a')]
  ...

Would that work on Node.js though? Judging by the OP's tags, we can't assume that he would have access to the DOM. — customcommander, Nov 17 '18 at 23:37

Most efficient way of finding a specific text out of string in JS?

3 Answers3