0

I have this regex expression (Java / JavaScript)

/(http|ftp|https):\/\/([\w+?\.\w+])+([a-zA-Z0-9\\~\\!\\@\\#\\$\\%\\^\\&\\*\\(\\)_\-\\=\\+\\\\\/\\?\\.\\:\\;\\'\\,]*\.(?:jpg|JPG|jpeg|JPEG|gif|GIF|png|PNG|bmp|BMP|tiff|TIFF))?/

But it seem to have issues with a URL like this one :

https://cdn.vox-cdn.com/thumbor/C07imD1SHmAnbObkg-nJ92N6sD8=/0x0:4799x3199/920x613/filters:focal(2017x1217:2783x1983):format(webp)/cdn.vox-cdn.com/uploads/chorus_image/image/62871037/seattle.0.jpg

What do you think is missing in my expression? I want to accept valid image URL.

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
Tlink
  • 875
  • 2
  • 14
  • 30
  • Try this one: ```(http|ftp|https):[/]{2}([\w+?\.\w+])+([a-zA-Z0-9~!@#$%^&*()_\-=+/?.:;',]*\.(?:jpg|JPG|jpeg|JPEG|gif|GIF|png|PNG|bmp|BMP|tiff|TIFF))?``` – accdias Nov 03 '19 at 18:26
  • From the [regex tag info](https://stackoverflow.com/tags/regex/info): "Since regular expressions are not fully standardized, all questions with this tag should also include a tag specifying the applicable programming language or tool." – Toto Nov 03 '19 at 18:26
  • 2
    Why all these escaped characters? Why not using case-insensitive flag? – Toto Nov 03 '19 at 18:27
  • right, you regex seems to be fine, so it depends on environment – Anatoliy R Nov 03 '19 at 18:27
  • 2
    Java/JavaScript? – Stephen Kennedy Nov 03 '19 at 19:25
  • Thanks all | @Toto I didn't get that, I'm escaping because Java requires that. – Tlink Nov 03 '19 at 19:40
  • 1
    The regex can be simplified to `/(?i)(?:ftp|https?):\/\/[-\w~!@#$%^&*()=+\\/?.:;',]+\.(?:jpe?g|gif|png|bmp|tiff)/` – Toto Nov 04 '19 at 10:06

1 Answers1

2

Your expression works for me in the validator I tested with (regex101.com), however, it matches as 3 separate capture groups. To capture it all as a single match, just wrap the whole statement in a set of parentheses.

Note: to be clear, there are simpler ways to do this, but to answer the specific question that the OP asked, this will make their statement match their supplied link.

((http|ftp|https):\/\/([\w+?\.\w+])+([a-zA-Z0-9\\~\\!\\@\\#\\$\\%\\^\\&\\*\\(\\)_\-\\=\\+\\\\\/\\?\\.\\:\\;\\'\\,]*\.(?:jpg|JPG|jpeg|JPEG|gif|GIF|png|PNG|bmp|BMP|tiff|TIFF))?)

EDIT: After assisting the OP in narrowing down the scope of their issue, a more appropriate regex statement would be something like this: /^(((http(s?))|((s?)ftp)):)([\w \D~!@#$%^&*\\_/-=+/?.:;',]){1,}\.(jpg|gif|png)$/i

Lets break this down: First this says it must start with either'http' with an optional 's', or if that isnt there, it will look for 'ftp' with an optional 's' prefixing it to account for secure forms of ftp. this must be followed with a colon. The next set accepts just about any commonly used character or symbol in a url path. Finally, it ensures that the expression ends with an actual image extension. wrapping the expression in /{expression}/i indicates that the expression is case insensitive and it will matche either upper or lower case, in any combination.

as a further note, you also may want to account for the print formats of .jpeg, .tif, etc.

  • thanks @NickGasiaRobitsch , I'm still curious what would be a simpler way, would you be able to share? – Tlink Nov 03 '19 at 19:34
  • In order to know how best to simplify this, I would first need to know exactly what the purpose of this statement is. For example, if this statement is to match a URL starting with http, https, etc, or a particular format, each might look different. Or if you wanted to parse out a key, if it matches a certain pattern. Regex is super flexible like that. I would start with case insensitve modifiers as Toto suggested above. [Here is a reference](https://stackoverflow.com/questions/3939715/case-insensitive-regex-in-javascript) on how to do that. Basically you just wrap it as ```/{expression}/i``` – Nick Gasia Robitsch Nov 03 '19 at 20:24
  • Thanks @NickGasiaRobitsch for the info and the reference, for this statement I want to include all image URLS from ftp, http and https . My challenge is many image URLs present their own "surprises" parentheses, parameters etc. this is when my expression drops the ball and determine that they're invalid, when in reality they are. – Tlink Nov 04 '19 at 16:37
  • @Tlink I updated the answer with a more complete explanation of what is actually going on there, and how to better optimize it. See my comment on other image formats and determine if that applies in your scenario, otherwise, you should be good to go. – Nick Gasia Robitsch Nov 04 '19 at 17:12