0

I'm exploring the possibilities of tagging MarkDown files with json datastructures.

The json data can be kept hidden from print outs by putting them in "comments", see StackOverflow - Comments in Markdown

[json]:# (
[
    "json goes here"
]
)

To extract the json tags I've been playing around with regexp and come up with

\[json]:#.\(([^*]*)\)

This however, only works if I only have one [json]-tag in the md-file.
(See One tag)

With more then one tag, the regexp gets greedy and includes everything in between the first and the last tag :/
(See Multiple tags)

This is sample code for reproducing the issue

$md = @'
[json]:# (
[
    {"jira": "proj-4753"},
    {"creation": "2021-09-25"}
]
)

# Title

## 1. Conclusion
Jada, jada

## 2. Recomendation
blah, blah

[json]:# (
[
    {"sensitivity": "internal"}
]
)

More data

[json]:# (
[
    {"uid": "abc002334"}
]
)
[json]:# (
[
    {"mode": "hallow"}
]
)
and this
'@

($md | Select-String '\[json]:#.\(([^*]*)\)').Matches.Value

No output provided as it will be the complete md except the last line

...

A proper output example when using multiple tags and specifying tag 2 should be like

($md | Select-String '<a working regexp>' -AllMatches).Matches[1].Value

[json]:# (
[
    {"sensitivity": "internal"}
]
)
($md | Select-String '<a working regexp>' -AllMatches).Matches.Value

[json]:# (
[
    {"jira": "proj-4753"},
    {"creation": "2021-09-25"}
]
)
[json]:# (
[
    {"sensitivity": "internal"}
]
)
[json]:# (
[
    {"uid": "abc002334"}
]
)
[json]:# (
[
    {"mode": "hallow"}
]
)

I could of course opt for using only one [json] tag per md.
There's also an optional solution of keeping the tags on one line only, but that will hamper readability.
And that wouldn't make for very robust code with two very possible scenarios (multiple tags and multi line tags) breaking the code.

Dennis
  • 871
  • 9
  • 29
  • 1
    Match until you encounter a `)` preceded by a newline: `'\[json]:#.\(([^*]*?)(?<=\n)\)'` – Mathias R. Jessen Sep 25 '21 at 13:43
  • @MathiasR.Jessen Sweet :) If you add that as an answer I'd have the option to mark it as the solution. – Dennis Sep 25 '21 at 13:50
  • 1
    But you do not need the lookbehind, `'\[json]:#\s*\(([^*]*?\n)\)'` will work the same – Wiktor Stribiżew Sep 25 '21 at 13:52
  • @WiktorStribiżew That also works. Could you elaborate what "lookbehind" is in this case? – Dennis Sep 25 '21 at 13:53
  • `(?<=\n)` is a lookbehind. I do not understand why you use `[^*]*`. Is `*` some kind of a separator here? I do suggest `(?sm)\[json]:#\s*\((.*?)^\)` – Wiktor Stribiżew Sep 25 '21 at 13:55
  • `*` doesn't include line breaks. So to include absolutely everything in between starting tag and ending tag, I need to specify the opposite as well `^*` – Dennis Sep 25 '21 at 13:56
  • 1
    @WiktorStribiżew I think it's to work around the fact that `.` doesn't match newlines by default and the comment might span multiple lines. Might be worth doing `(?s)\[json]:#.\(\s*(.*?)\s*\n\)` instead – Mathias R. Jessen Sep 25 '21 at 13:56
  • Yes, but a workaround like that is actually an error in this case. Once a string between the delimiters contains a `*`, this match will fail. – Wiktor Stribiżew Sep 25 '21 at 13:58
  • The best solution provided seems impossible to choose. I will go with something I (think) I understand but is sufficiently generic and still as fast as possible :) – Dennis Sep 25 '21 at 14:14

3 Answers3

1

Since a valid json comment will always be followed by a newline \n and then a closing parenthesis ), use that as your end-of-pattern anchor:

if($md -match '(?s)\[json]:#.\(\s*(.*?)\s*\n\)'){
  $Matches[1]
}

(?s) is the regex engine option for "single-line mode", it makes . match newline characters, allowing us to capture across multiple lines with .*?.

$Matches is an automatic variable that gets populated with all capture group values when a -match operation succeeds in scalar mode.

Result:

[
    {"jira": "proj-4753"},
    {"creation": "2021-09-25"}
]
Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
  • Is `$Matches` a generic variabel such as `$Error`? Seems to be. – Dennis Sep 25 '21 at 14:27
  • 1
    @Dennis [`$Matches` is an automatic* variable](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_automatic_variables#matches), yes :) – Mathias R. Jessen Sep 25 '21 at 14:38
  • There's something fishy with `$md -match `. It only returns two hits to `$Matches` (there should be three), of witch the second is incomplete. – Dennis Sep 25 '21 at 14:43
  • Ok, got it. You are actually only getting only one hit. But the two object is the first and second regexp grouping results :) – Dennis Sep 25 '21 at 14:53
  • I'm trying to concisely showcase a solution to the problem with your regex pattern, feel free to appropriate it as necessary :) `Select-String` will do, as will `[regex]::Matches()` – Mathias R. Jessen Sep 25 '21 at 15:03
  • Yes, `Select-String` can handle multiple results, thx :) – Dennis Sep 25 '21 at 15:38
1

I suggest using

(?sm)\[json]:#\s*\((.*?)^\)

See the regex demo. Details:

  • (?sm) - s (RegexOptions.Singleline inline option enabling . to match line break chars) and m (RegexOptions.Multiline inline modifier that makes ^ match any line start position and $ match any line end position) on
  • \[json]:# - a \[json]:# substring
  • \s* - zero or more whitespaces
  • \( - a ( char
  • (.*?) - Group 1: any zero or more chars as few as possible
  • ^\) - a ) char at the start of a line.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
-1

(?s) will let . match line endings too. The ? after * makes it a lazy match. In powershell 7, select-string will only highlight the first 3 lines as a match. I grouped what's inbetween, except for the line endings. The other way to go is positive lookbehind and positive lookahead.

'one
two
three
one
four
three' | select-string '(?s)one.(.*?).three'

This is highlighted:

one
two
three
$one = [regex]::escape('[json]:# (')
$three = [regex]::escape(')')
$md | select-string "(?s)$one.(.*?).$three"

Only this is highlighted:

[json]:# (
[
    {"jira": "proj-4753"},
    {"creation": "2021-09-25"}
]
)

Showing the group match:

$md | select-string "(?s)$one.(.*?).$three" | % matches | % groups | 
  % value | select -last 1

[
    {"jira": "proj-4753"},
    {"creation": "2021-09-25"}
]
js2010
  • 23,033
  • 6
  • 64
  • 66