-2

I'm working on a NodeJS script (launched from the CMD with the node command) getting me some HTML content in a string, in which I need to extract some data between a specific <div> element. I'm having a hard time firguring why this portion of code doesn't give me the desired output.

const input = '<div class="some_class">Some data</div><div class="some_other_class">< class="some_other_other_class">...</div></div>'
const regex = new RegExp(/<div class="some_class"\>(.*?)<\/div>/g)
let obj = {
    'tmp': input.search(regex),
}
console.log(obj) // outputs { tmp: 0}
console.log(input.search(/<div class="some_class"\>(.*?)<\/div>/g)) // outputs 0
 
const x = input.search(/<div class="some_class"\>(.*?)<\/div>/g)
console.log(x) // outputs 0

I know this seems a bit of a regular issue here, but I tried passing the Regex with string format (between single quotes '), passing it as a Regex (between delimiter /) and finally by defining a new RegExp element, but without success. I always happen to get 0 as an output.

However, when I test it on an online tool, it does match and capture the desired data in the group #1 : https://www.regextester.com/?fam=131034

I don't know if I'm missing something or if I'm doing something wrong, but after some hours spent on this issue, I'm quite struggling to get my ideas straight.

jijihbt
  • 35
  • 6
  • 2
    [I'd recommend *(pretty strongly)* against doing anything with HTML/XML/etc. with regex.](https://stackoverflow.com/a/1732454/438992) Use a parser. I mean... it matches starting at char 0; what do you expect to happen? I.e., insert some random, non-matching chars at the beginning and see what happens to the result. – Dave Newton Jun 07 '23 at 19:01

1 Answers1

1

String::search() returns the found string's position, which is 0 in your case which is perfectly right. You need String::match() and don't forget to get the right regexp group index:

const input = '<div class="some_class">Some data</div><div class="some_other_class">< class="some_other_other_class">...</div></div>'

console.log(input.match(/<div class="some_class">(.*?)<\/div>/)?.[1])

To avoid bothering with the groups I prefer sometimes use assertions:

const input = '<div class="some_class">Some data</div><div class="some_other_class">< class="some_other_other_class">...</div></div>'

console.log(...input.match(/(?<=<div class="some_class">).*?(?=<\/div>)/))

If your html changes often I recommend to use https://www.npmjs.com/package/jsdom to use DOM to access content inside your needed tags.

Alexander Nenashev
  • 8,775
  • 2
  • 6
  • 17