0

I've a bunch of Markdown links with whitespace, and I need to replace the whitespace with %20. So far I've hacked a few solutions, but none that work in VSCode, or do exactly what I'm looking for.

This is the URL format conversion I need:

[My link](../../_resources/my resource.jpg)
[My link](../../_resources/my%20resource.jpg)

\s+(?=[^(\)]*\)) will work on any whitespace inside brackets - but gives false positives as it works on anything with brackets.

(?:\]\(|(?!^)\G)[^]\s]*\K\h+ does the job, but I'm getting some "Invalid Escape Character" messages in VSCode, so I assume the language isn't compatible.

I've been trying to identify the link on the characters ]( but as I'm relatively new to regex, struggling a bit.

I tried with this regex: (?<=\]\()s\+ as this (?<=\]\().+ correctly identifies the url, but it doesn't work.

Where am I going wrong here? Thanks in advance!

EDIT: VSCode find in files doesn't support variable length lookbehind, even though find/replace in the open file does support this. Open to any other solutions before I dive into writing a script!

jt196
  • 17
  • 5
  • You can't do that with a single regex pass in VSCode, so use Notepad++ that has a Boost regex engine and also provides Find/Replace in Files option. The regex is `(\G(?!\A)|\[[^][]*]\()([^()\s]*)\s+(?=[^()]*\))` and replace with `$1$2%20`. – Wiktor Stribiżew Nov 03 '21 at 22:21

3 Answers3

2

VSCode regex does not support \K, \G, or \h, but it does support Lookbehinds with non-fixed width. So, you may use something like the following:

(?<=\]\([^\]\r\n]*)[^\S\r\n]+

Online demo.

1

You can use

(?<=\]\([^\]]*)\s+(?=[^()]*\))

Replace with %20. See the demo screenshot:

enter image description here

Details:

  • (?<=\]\([^\]]*) - a positive lookbehind that matches a location that is immediately preceded with ]( and then any zero or more chars other than ]
  • \s+ - any one or more whitespace chars (other than line break chars in Visual Studio Code, if there is no \n or \r in the regex, \sdoes not match line break chars)
  • (?=[^()]*\)) - a positive lookahead that matches a location that is immediately followed with zero or more chars other than ( and ) and then a ) char.

Since you are using it in Find/Replace in Files, this lookbehind solution won't work.

You can use Notepad++ with

(\G(?!\A)|\[[^][]*]\()([^()\s]*)\s+(?=[^()]*\))

and $1$2%20 replacement pattern. In Notepad++, press CTRL+SHIFT+F and after filling out the necessary fields, hit Replace in Files.

See the sample settings:

enter image description here

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Didn't know that VSCode doesn't match vertical whitespace characters by default. That makes this answer better than mine. Have my upvote :) – 41686d6564 stands w. Palestine Nov 03 '21 at 19:47
  • 1
    @41686d6564 More, `\[[^\]\[]*\]` does not match across lines either, so it also affects negated character classes. This is compliant with Vim, though here the similarity ends (as in Vim, one needs to use ``\_`` to enable linebreak matching while in VSCode, `\r` or `\n` must be added anywhere in the pattern (even `\n{0}` will do though it makes little sense in general). – Wiktor Stribiżew Nov 03 '21 at 21:15
  • Thanks for both your answers folks - annoyingly, this works for the find-replace in a file, but not in the find in files (ctrl/cmd-shift-f). Looks like they're working on slightly different versions of Regex - the find in files doesn't support variable lookbehinds. – jt196 Nov 03 '21 at 21:31
  • @jt196 Right, it won't since the regex engine in the Find/Replace in Files uses Rust regex engine. not the ECMAScript 2018 compliant one in in-document search and replace. You will need to use your regex in Notepad++ Replace in Files feature. – Wiktor Stribiżew Nov 03 '21 at 21:32
  • [SO post](https://stackoverflow.com/questions/42179046/what-flavor-of-regex-does-visual-studio-code-use/54227294#54227294) with some more info. – jt196 Nov 03 '21 at 21:35
  • Also, and I guess this is my bad for having the idea, but this will match any whitespace until the end of the line after the `](` appearance. See [this example](http://regexstorm.net/tester?p=%28%3f%3c%3d%5c%5d%5c%28%5b%5e%5c%5d%5d*%29%5cs%2b&i=%23%23%23+Please+%5blogin%5d%28https%3a%2f%2fwww.accountingweb.co.uk%2fuser%2flogin%3fdestination%3dnode%2f81571%29+or+%5bregister%5d%28https%3a%2f%2fwww.accountingweb.co.uk%2fuser%2fregister%3fdestination%3dnode%2f81571%26referrer%3dcomment%29+to+join+the+discussion.&r=%2520). – jt196 Nov 03 '21 at 22:15
  • @jt196 Added a fix for the VSCode in-document S&R solution, and a NPP workaround for Replace in Files. – Wiktor Stribiżew Nov 03 '21 at 22:29
  • Weirdly, your fix works [here](http://regexstorm.net/tester?p=%28%3f%3c%3d%5c%5d%5c%28%5b%5e%5c%5d%5d*%29%5cs%2b%28%3f%3d%5b%5e%28%29%5d*%5c%29%29&i=I+quoted+Michael+Porter+in+my+%5bALA+article%5d%28http%3a%2f%2fwww.alistapart.com%2farticles%2fredesignrealign%29+and+I%e2%80%99ll+do+so+again+here%3a+*%e2%80%9cThe+essence+of+strategy+is+choosing+what+not+to+do.%e2%80%9d&r=) but not in VSCode. Still getting the post link whitespace matches. Dude, don't pull your hair out over this you've already done enough! – jt196 Nov 03 '21 at 23:38
  • @jt196 It works in VSCode, but only in the in-document S&R feature. As I said, it supports the ECMAScript 2018+ regex syntax, and the File search/replace is based on Rust regex, it won't allow any lookarounds. I really suggest switching to Notepad++ to replace in multiple files. – Wiktor Stribiżew Nov 03 '21 at 23:41
0

In the end, as I'm on a Mac and didn't want to fire up a virtual PC to run Notepad++ (Sublime uses the same engine and Atom doesn't allow you exclude files), I used a combination of a Python script with @Wiktor Stribizew's answer for individual files that weren't picked up by the pattern for whatever reason.

md_url_pattern = r'(\[(.+)\])\(([^\)]+)\)'

def remove_spacing(match_obj):
    if match_obj.group(3) is not None:
        print("Match Object: " + match_obj.group(1) + "(" + re.sub(r"\s+", "%20", match_obj.group(3)) + ")")
        return match_obj.group(1) + "(" + re.sub(r"\s+", "%20", match_obj.group(3)) + ")"

# THIS_FOLDER = os.path.dirname(os.path.abspath(__file__))
this_folder = '<my_document_folder>' # fixed folder path
note_path = '<note_folder>' # change this 
full_path = os.path.join(this_folder, note_path)
directory = os.listdir(full_path)
os.chdir(full_path)

for file in directory:
    open_file = open(file, 'r')
    read_file = open_file.read()
    read_file = re.sub(md_url_pattern, remove_spacing, read_file)
    if not read_file:
        print("Empty file!")
    else:
        write_file = open(file,'w')
        write_file.write(read_file)

This script could do with a bit of tidying up and debugging (the odd weird empty file and no subfolder compatibility) but it was the best I could do.

jt196
  • 17
  • 5