-3

I use this regex which is functional but I would like to optimize it. I find it very "literal" and so I think it has an impact on the treatment using the tokens it should be faster, only it's not very obvious for a beginner ...

const pattern = /left\s:([-]{0,1}[0-9]{1,4})px;\stop:([0-9]{1,4})px;"\s\s\tonmouseover="updi\(event,'([0-9]{4}-[0-9]{2}-[0-9]{2})\s([0-9]{2}:[0-9]{2})\s([A-Z]{3,4})\s\((T[+]{1}\s[0-9]{1}:[0-9]{2}|T[+]{1}[0-9]{2,3}:[0-9]{2})\)<br>Distances:&nbsp;([0-9]{1,4}\.[0-9]{1}nm)\/([0-9]{1,4}\.[0-9]{1}nm)<br><b>Wind:<\/b>\s([0-9]{1,3}&deg);\s([0-9]{1,2}\.[0-9]{1}\skt)\s\(<b>TWA\s([-]{0,1}[0-9]{1,3}&deg);<\/b>\)<br><b>Heading:<\/b>\s([0-9]{1,3}&deg);<b>Sail:<\/b>\s([a-zA-Z]{2,4})<br><b>Boat\sSpeed:<\/b>\s([0-9]{1,3}\.[0-9]{2}\skts)/

She extracts the values that I exploit later and works on this kind of code:

<img src="img/dot.png" alt="" class="abs" style="z-index: 1; left :-4904px; top:2437px;" 
onmouseover="updi(event,'2019-10-15 02:00 CEST (T+ 1:50)<br>Distances:&nbsp;1271.8nm/447.1nm<br><b>Wind:</b> 295&deg; 5.8 kt (<b>TWA 65&deg;</b>)<br><b>Heading:</b> 230&deg;<b>Sail:</b> Jib<br><b>Boat Speed:</b> 3.23 kts','220px')" onmouseout="cleari()" 
onmousedown="show_wind(366);">
<img src="img/dot.png" alt="" class="abs" style="z-index: 1; left :49px; top:243px;" 
onmouseover="updi(event,'2019-10-15 02:00 CET (T+363:50)<br>Distances:&nbsp;1271.8nm/447.1nm<br><b>Wind:</b> 295&deg; 5.8 kt (<b>TWA 65&deg;</b>)<br><b>Heading:</b> 230&deg;<b>Sail:</b> Jib<br><b>Boat Speed:</b> 3.23 kts','220px')" onmouseout="cleari()" 
onmousedown="show_wind(366);">
GeGaX
  • 37
  • 9

1 Answers1

0

This is MUCH better solved with an HTML parser. If you have even a tiny difference in your regex to your HTML, everything's going to break. HTML with Regex is a bad idea.

...But if you want to optimize your regex, that can be done.

  • [0-9] can be replaced with \d
  • [a-zA-Z_] can be replaced with \w
  • [-]{0,1} can be replaced with -?
  • T[+]{1} can be replace with T\+
  • Unimprtant text can often be skipped by .*, IF you have a well-defined pattern after it

Here's a working, reduced version of your regex. Not sure if it's "faster" or not, since the one you posted doesn't work. Let me reiterate again that I strongly recommend you use an HTML parser instead.

left :(-?\d{1,4})px; top:(-?\d{1,4})px;"\s*onmouseover="updi\(event,'(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}) (\w{3,4}) \((T\+ ?\d{1,3}:\d{2})\)<br>Distances:&nbsp;(\d{1,4}\.\dnm)\/(\d{1,4}\.\dnm)<br><b>Wind:<\/b> (\d{1,3}&deg); (\d{1,2}\.\d kt) \(<b>TWA (-?\d{1,3}&deg);<\/b>\)<br><b>Heading:<\/b> (\d{1,3}&deg);<b>Sail:<\/b> (\w{2,4})<br><b>Boat Speed:<\/b> (\d{1,3}\.\d{2} kts)

Try it here!

Nick Reed
  • 4,989
  • 4
  • 17
  • 37
  • Thanks for the explanation is much clearer. (with the matches) On regex101, the processing time is the same. I had the idea that using tokens it would be faster but no, however it is less time to write. Thanks for the regex and the explanations. – GeGaX Oct 01 '19 at 20:57
  • @GeGaX you're welcome. At <600 steps, your regex is already very efficient, and probably doesn't need to be refined. Please remember to accept the answer if it addresses your question, and if you need further clarification, feel free to let me know. – Nick Reed Oct 01 '19 at 21:03