0

I'm trying to use regular expressions in JS but I'm probably missing something because I can't get it to work. I have a couple of regular expression working fine in PHP (used with a preg_match) but when I use exactly the same expression in JS, I get no matched patterns.

Here is an example, I'm trying to parse the page:

https://www.coolblue.be/fr/rechercher?query=GTX+1060&trier=prix-les-moins-chers

My code:

var pattern = '/<a class=\"product__title js-product-title\" href=\"(.*)\" data-trackclickevent=\"(.*)\">[\n\r\s]+(.*)[\n\r\s]+<\/a>/gi';
var found = content.match(pattern);

The variable content contains the full source code of the page, I have dumped it in the console to make sure it was working and I see for example: (the code is dirty but I took it from the page mentionned above without changing anything)

<div class="product__titles"><div class="js-product-feature-title"></div><a class="product__title js-product-title" href="/fr/produit/654109" data-trackclickevent="Internal Search, Product, Oehlbach BTX 1000 (654109) - Product title">
                Oehlbach BTX 1000
            </a></div><div class="product__review-rating"><div class="review-rating alt-compact"><div class="review-rating--rating">

When I use https://regex101.com/ to test my regular expression, it also works but somehow in JS it doesn't.

Any idea of what I'm missing ?

thanks

Laurent

Laurent
  • 1,465
  • 2
  • 18
  • 41
  • 3
    This is called parsing. Don't use Regular Expressions for parsing HTML documents. Use a DOM parser instead. – revo Feb 17 '18 at 15:35
  • 2
    [Don't use Regular Expressions for parsing HTML documents](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). – Danny Fardy Jhonston Bermúdez Feb 17 '18 at 15:39
  • 1
    Even though the regex is a killer, it works while switching between `php` to `javascript` in [regex101](https://regex101.com/r/r15XtS/1). – revo Feb 17 '18 at 15:40
  • Thanks for pointing the obvious :) I'll give it a try with DOM parsing. – Laurent Feb 17 '18 at 16:53

2 Answers2

1

In JavaScript you will use regular expressions mostly in two methods: test and replace. Whereas test just tells you whether its argument matches the regular expression, replace takes a second parameter: the string to replace the text that matches. Like most functions, replace generates a new string as a return value; it does not change the input eg;

document.write(/cats/i.test("Cats are fun. I like cats."));
And replace:

document.write("Cats are fun. I like cats.".replace(/cats/gi,"dogs"));

and also in Javascript, you have to escape the close bracket "]" as below;

\[([^\]\s]+).([^\]]+)\]
Mr Nsubuga
  • 300
  • 3
  • 7
1

Ok, here is how I solved it.

var el = document.createElement( 'html' );
el.innerHTML = content;
all_links = el.getElementsByClassName("product__title");

content needs to contain your html

Laurent
  • 1,465
  • 2
  • 18
  • 41