-2

I would like to convert this Ruby regex to JavaScript one:

/\<br \/\>\<a href\=(.*?)\>([0-9]+\:[0-9]+)\<\/a\> \<a href\=\'.*?\' target\=\_blank\>(.*?)(?=\<\/a\>\<br\>\<p.*?\<\/p\>\<br \/\>\<a href\=.*?\>([0-9]+\:[0-9]+)\<\/a\> \<a href\=\'.*?\' target\=\_blank\>.*?\<\/a\>.*?\<br \/\>)/m

It works perfectly in Ruby, but not in the Chrome JavaScript console. Then I will use it to extract some information from a webpage source HTML code (document.body.innerHTML) with a JavaScript function using this scan method described here: JavaScript equivalent of Ruby's String#scan

I think the lookahead (?= ) may be problematic in JavaScript, on the top of that it contains a capture group. Can it be converted at all?

Community
  • 1
  • 1
Konstantin
  • 2,983
  • 3
  • 33
  • 55

1 Answers1

2

In JavaScript you could do the following:

var re = new RegExp("<br /><a href=(.*?)>([0-9]+:[0-9]+)</a> <a href='.*?' target=_blank>(.*?)(?=</a><br><p.*?</p><br /><a href=.*?>([0-9]+:[0-9]+)</a> <a href='.*?' target=_blank>.*?</a>.*?<br />)", "m");

But this likely will not work the same because the m modifier in Ruby makes the . match all characters while in JavaScript this means multi-line mode where ^ and $ match at the start and end of each line.

So if you really think you need regex to do this and the HTML data you are matching has or could have line breaks, you will need to remove the m flag and replace .*? with a workaround such as [\S\s]*? to match those characters as well since JavaScript does not have a dotall mode modifier that works like the Ruby m modifier.

hwnd
  • 69,796
  • 4
  • 95
  • 132