Non-greedy regexp matching too much in pandoc-generated markdown file

Question

The Problem

I'm trying to write a simple intermediary step in a Pandoc workflow. I have an original document in .docx which I'm converting to .md using the --track-changes switch (see Pandoc reader options for more information) to produce a markdown file which has MS word insertions/deletions/comments wrapped in span tags, e.g.

[Insertion text]{.insertion id="1" author="Jamie Bowman" date="2019-04-01T11:05:00Z"}

[Deletion text]{.deletion id="1" author="Jamie Bowman" date="2019-04-01T11:05:00Z"}

[Comment body]{.comment-start id="1" author="Jamie Bowman" date="2019-04-01T11:05:00Z"}[]{.comment-end id="1"}

I want to run a regexp find and replace on the markdown file which effectively 'accepts' insertions and deletions but leaves the comment spans. (This is so when I convert back to .docx, I have a clean .docx file with comments only.)

What I've tried

I have been able to accept all insertion spans and delete all deletion spans, but only when the body text does not carry across more than one line. My attempt at matching across new lines matches too much and I can't work out how to match the exact text only.

The following regexp matches almost all deletions which I can then replace with nothing:

Find: \[(.*?)\]{.deletion(.|\n)*?}

Replace:

Same for insertions which I can then use a backreference to retain the text but remove the span:

Find: \[(.*?)\]{.insertion(.|\n)*?}

Replace: $1

The patterns are matching too much, though, as you can see here.

Please let me know if anything is unclear. I've been working on this quite a bit today and it's difficult to explain the problem plainly! Thanks in advance.

Do you want `(?s)\[([^][]*)]{\.deletion.*?}`, https://regex101.com/r/zbUttw/2? And similarly, `(?s)\[([^][]*)]{\.insertion.*?}`? — Wiktor Stribiżew, Apr 16 '19 at 14:20
Thank you - I can see that works in your example, but unfortunately it is not working in VS code (error: Invalid group.) Perhaps I made I mistake describing the RegExp flavour? I have PCRE enabled in my VS code settings, so I'm not sure. — jbowman, Apr 16 '19 at 14:54
Answer to my question above [here](https://stackoverflow.com/questions/42179046/what-flavor-of-regex-does-visual-studio-code-use). Regex does not work because VS code only supports Javascript valid regexp. — jbowman, Apr 16 '19 at 15:18
If you have VSCode, it has nothing to do with PCRE (the option is called so but all PCRE specific features are removed - it is described in VSCode docs somewhere). You need `\[([^\][\n]*)]{\.deletion[\s\S\n]*?}` — Wiktor Stribiżew, Apr 16 '19 at 18:41

score 0 · Answer 1 · answered Apr 16 '19 at 14:21

0

The following regex should match the deletion fragments:

\[[^[]*?\]{\.deletion.*?}

The regex for the insertions are mostly the same, except you have to have a capturing group ([^[]*?\):

\[([^[]*?\)]{\.insertion.*?}

answered Apr 16 '19 at 14:21

Ildar Akhmetov

1,331
13
22

Non-greedy regexp matching too much in pandoc-generated markdown file

The Problem

What I've tried

1 Answers1