0

If anyone who is better at regex could assist it would be greatly appreciated. I am trying to get the following regex (see URL) to find all HTML JavaScript comments within the script tag. This will run using Windows PowerShell for the task needed.

The below example is what I have so far. However, it still doesn't:

(?s)(?(?=\A).*?<script[^>]*>).*?(?:\K\/\/|<\/script>.*?(?:<script[^>]*>|\z)(*SKIP)(*FAIL))
  • should highlight all the text to the right of the "//" until line-break
  • should not include URLs inside of the script tag
  • should not be case sensitive for the script tag

Example URL also including seven test scenarios: https://regex101.com/r/YpCJXM/1

Goal: If each of the scenarios could have the comment text highlighted while not including any of the extra areas outside of the script tag. As long as it works on regex101 I can make it work with PS!

Edit: I am fully aware you should not parse this with regex! However, I'm sure someone more skilled in regex could easily handle just completing the few scenarios needed for this task.

Edit_2: The below is another example. However, it still doesn't:

(\/\*[\s\S]*?\*\/|([^:]|^)\/\/.*)
  • should only include text inside the script tag

Answer: The below is a slight change of MikeM's answer to allow both http or https

(?si)(?<!http:|https:)\/\/[^\r\n]*(?=(?:(?!<script[^>]*>).)*<\/script>)
  • You might want to have a quick look at [this answer](https://stackoverflow.com/a/1732454/2487517) – Tibrogargan Jul 20 '21 at 20:28
  • Yes, I know you should not parse with regex. However, this is an unusual situation that would greatly benefit with being able to collect the JS comments. It doesn't need to find everything, just ideally the seven tests created in the example should pass. – lookin4help Jul 20 '21 at 20:35
  • Does this help? `\K(?<!https:)\/\/.*?[\r\n]|` – TheMadTechnician Jul 20 '21 at 20:36
  • You want a DOM parser to select all the script nodes that match JavaScript, then a Javascript parser to extract all the comments. Neither HTML nor JavaScript are regular languages. You might be able to do the 2nd part with regex, since your use case is somewhat specific. The 1st part is quite literally a single line selector. – Tibrogargan Jul 20 '21 at 20:39
  • @TheMadTechnician, thank you. However, this unfortunately also includes items outside of the – lookin4help Jul 20 '21 at 20:41
  • Bunch of javascript parser related stuff [here](/questions/2554519/javascript-parser-in-javascript) – Tibrogargan Jul 20 '21 at 20:44
  • You say you will be using the regex in Powershell, but you selected a PCRE2 option at the online regex tester and created a PCRE regex that is not compatible with .NET. It is as good as nothing. – Wiktor Stribiżew Jul 20 '21 at 20:46
  • @WiktorStribiżew, understand you're concerned with it being PCRE2, this should not be an issue :) as long as needed comments are the only ones highlighted. This is my last regex needed of 12 others and the only one my little knowledge of regex has been unable to solve. – lookin4help Jul 20 '21 at 20:53
  • Imagine you spend some more days on it and once you finally write it, you will discover you cannot use it in the target code. Why bother? – Wiktor Stribiżew Jul 20 '21 at 20:56

1 Answers1

1

The following is not foolproof but it passes your tests:

(?si)(?<!https:)\/\/[^\r\n]*(?=(?:(?!<script[^>]*>).)*<\/script>)

The positive lookahead ensures the closing script tag comes before the opening script tag, ahead in the string.

Example usage:

$pattern = '(?si)(?<!https?:)\/\/[^\r\n]*(?=(?:(?!<script[^>]*>).)*<\/script>)'
$results = $data | Select-String $pattern -AllMatches
$results.Matches.Value

// find this comment here
//find this comment here
//find this comment here
// find this comment here
// find this comment here
//find this comment here
// find this comment here
//find this comment here
//find this comment here with this included also!
MikeM
  • 13,156
  • 2
  • 34
  • 47