0

I have a html string, where i need to grab the src and href values, if they meet a pattern. The pattern is;

/myfolder/[gu-id]/[image,file]

The gu-id is in the pattern of '65f2383b-de39-4a9c-8e8c-de1c06e469ca' The image or file can be any jpg,gif,pdf,doc,xlsx.png,txt,zip and so on.

My current regex is this: ((\/myfolder\/[({]?[a-fA-F0-9]{8}[-]?([a-fA-F0-9]{4}[-]?){3}[a-fA-F0-9]{12}[})]?\/?.*\.(?:png|jpg|pdf|gif|jpeg|xls|xlsx|word|doc|txt|zip)))

But in a string with multiple files, it ends with the last one, as the end of the first match - so it only finds one match, but the end point, is another file.

How can I make it match every file, and not only one?

brother
  • 7,651
  • 9
  • 34
  • 58

1 Answers1

1

Instead of using .* you could match non whitespace char \S*

As a minor note, if you are not using the capturing groups for after processing and want the match only you could omit them. The single hyphen in the character class [-]? can be written as -?. The forward slash \/ does not need escaping

The alternation could be shortened a bit to (?:png|pdf|gif|jpe?g|xlsx?|word|doc|txt|zip)

You could update the pattern to:

/myfolder/[({]?[a-fA-F0-9]{8}-?(?:[a-fA-F0-9]{4}-?){3}[a-fA-F0-9]{12}[})]?/?\S*\.(?:png|pdf|gif|jpe?g|xlsx?|word|doc|txt|zip)

.NET Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70