Regular Expression to match markdown and regular href sources from specific domain(s)

Question

I'm trying to append something to the query string to various assets hosted on filepicker which serves from a few specific domains, some of which already contain a query string. All other URLs should be left untouched.

For example, we might have the following in markdown:

![image](https://www.filepicker.io/api/file/12x3DD5667dxfjdf/convert?w=600)

We could also have

<img src="https://www.filepicker.io/api/file/someotherfile" />

Or

<a href='https://www.filestack.com/api/file/anothersdf?sdf=3&dfdf=1'>link</a>

Just trying to match against one domain for the moment I have the following regular expression which isn't matching all cases:

I do not want to match any references to other domains.

I've had mixed success with the following:

/(https:\/\/www\.filepicker.io\/api\/file\/[a-zA-Z0-9]+(\/convert)*[^)])$/is

[Don't parse HTML with regular expressions.](https://stackoverflow.com/a/1732454/7008354) — Tobias F., Dec 13 '18 at 21:32
Less is more with Regx... `www.filestack.com` is not `www.filepicker.io`, that said `/(https?:\/\/www\.(?:filepicker\.io\/|filestack\.com\/)api\/file\/[\w+?&=\/]+)/` https://regex101.com/r/DA8Gok/2 — ArtisticPhoenix, Dec 13 '18 at 21:32

score 0 · Accepted Answer · answered Dec 13 '18 at 21:38

This will match what you want:

 /(https?:\/\/www\.(?:filepicker\.io\/|filestack\.com\/)api\/file\/[\w+?&=\/]+)/

https://regex101.com/r/DA8Gok/2/

But Regex is not well suited (by itself) to parse any type of structured data. Instead make a lexer/parser.

Here are some examples of ones I have written:

https://github.com/ArtisticPhoenix/MISC/tree/master/Lexers

http://artisticphoenix.com/2018/11/11/output-converter/

Regular Expression to match markdown and regular href sources from specific domain(s)

1 Answers1