-5

I'm looking for the most efficient way of finding and returning a specific text out of a big massive string in JS.

The rule of the specific text is a text that starts with "ID_" and ends with ".pdf".

Assuming I'm having such a string (short tiny version of it):

<ul>
<li><a href="/questions/237104/ID_2556.pdf">Click here to
download.</a></li>
<li><a href="/questions/237104/ID_37.pdf">Click
here to download.</a></li>
<li><a
href="/questions/237104/ID_29997.pdf">Click here to download.</a></li>
<li><a href="/questions/237104/ID_0554.pdf">Click here to
download.</a></li>
</ul>

The script should return these separate values as strings:

ID_2556.pdf

ID_37.pdf

ID_29997.pdf

ID_0554.pdf

CoreDo
  • 2,691
  • 4
  • 17
  • 18
  • 3
    OK, what have you tried and why did it not work for you? And what is with the `json` tag? There doesn't seem to be any JSON here – VLAZ Nov 17 '18 at 23:31
  • 3
    How exactly do you define "most efficient / lightest"? What have you tried and have you measured it against your metric? – Ingo Bürk Nov 17 '18 at 23:31
  • 1
    Do you actually have a Javascript string of HTML? Or do you have that HTML content in a page? – jfriend00 Nov 17 '18 at 23:31
  • 3
    @GeorgeJempty why did you remove the `node.js` tag? It seems relevant here – VLAZ Nov 17 '18 at 23:41

3 Answers3

2

You can get all matching strings with String.prototype.match:

var html = `
<ul>
<li><a href="/questions/237104/ID_2556.pdf">Click here to
download.</a></li>
<li><a href="/questions/237104/ID_37.pdf">Click
here to download.</a></li>
<li><a
href="/questions/237104/ID_29997.pdf">Click here to download.</a></li>
<li><a href="/questions/237104/ID_0554.pdf">Click here to
download.</a></li>
</ul>
`;

console.log(html.match(/ID_.*?pdf/g))
customcommander
  • 17,580
  • 5
  • 58
  • 84
1

You might want to use regex for this task /ID_.*?\.pdf/gm:

Here is a playground: https://regex101.com/r/mD5Yt3/1

It will generate code for you:

const regex = /ID_.*?\.pdf/gm;
const str = `<ul>
<li><a href="/questions/237104/ID_2556.pdf"><a href="/questions/237104/ID_2556.pdf">Click here to
download.</a></li>
<li><a href="/questions/237104/ID_37.pdf">Click
here to download.</a></li>
<li><a
href="/questions/237104/ID_29997.pdf">Click here to download.</a></li>
<li><a href="/questions/237104/ID_0554.pdf">Click here to
download.</a></li>
</ul>`;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }

    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}
Maxim Mazurok
  • 3,856
  • 2
  • 22
  • 37
0

One option would be to use DOMParser to turn the HTML string into a document, then select as which end in .pdf, figure out which ones fulfill the desired format, and push them to an array:

const htmlStr = `<ul>
<li><a href="/questions/237104/ID_2556.pdf">Click here to
download.</a></li>
<li><a href="/questions/237104/ID_37.pdf">Click
here to download.</a></li>
<li><a
href="/questions/237104/ID_29997.pdf">Click here to download.</a></li>
<li><a href="/questions/237104/ID_0554.pdf">Click here to
download.</a></li>
</ul>`;

const doc = new DOMParser().parseFromString(htmlStr, 'text/html');
const filenames = [...doc.querySelectorAll('a[href$=".pdf"]')]
  .reduce((filenames, { href }) => {
    const match = href.match(/ID_\d+\.pdf/);
    if (match) filenames.push(match[0]);
    return filenames;
  }, []);
console.log(filenames);

You could also do all the filtering inside the reduce, rather than in the selector string, if you wanted to reduce the code, might be a bit less efficient though:

const filenames = [...doc.querySelectorAll('a')]
  ...
CertainPerformance
  • 356,069
  • 52
  • 309
  • 320