0

I'm looking for preg_match_all pattern to find all URL on a page that don't have trailing slash.

For example: if I have

a href="/testing/abc/">end with slash

a href="/testing/test/mnl">no ending slash

The result would be #2. Solution is posted at find pattern for url with no ending slash

I have tried to modify the provided pattern to exclude urls that have 'images' or '.pdf' but no luck yet.

Thanks.

Community
  • 1
  • 1
user2170712
  • 17
  • 2
  • 6

2 Answers2

2

This one should suit your needs (demo):

href="(?:(?<!images).(?!(?:[.]pdf|/)"))*?"
  • (?:) = non-capturing groupe
  • (?<!images). = any char not preceded by images
  • .(?!(?:[.]pdf|/)") = any char not followed by .pdf" nor by /"
  • *? = match as short as possible
sp00m
  • 47,968
  • 31
  • 142
  • 252
1

I found a way to exclude a link that has .pdf, by modifying the provided answer from the other question. Still looking at why it won't not match the images example though.

href=(['"])[^\s]+(?<![\/]|.pdf)\1

Link to a working test http://www.rubular.com/r/jmBVstpGZD

Zack
  • 2,789
  • 33
  • 60
  • This regex would reject .bmp and .tif as well. Can you figure out why? :) – Andrew Cheong Mar 19 '13 at 17:04
  • hi zack, how about also exclude url that has 'images' (string) in it? – user2170712 Mar 19 '13 at 17:04
  • @acheong87 No I really have no idea oh cryptic master. :p I'm assuming you do know, and can provide some reasoning or possibly a link to why. – Zack Mar 19 '13 at 17:10
  • 1
    Haha, it's because `[.pdf]` is a character class that means "any _one_ of the characters, `.`, `p`, `d`, and `f`." You excluded anything that ended in one of those characters. What you meant is `\.pdf` without the square brackets. Sorry, didn't mean to leave you hangin'. – Andrew Cheong Mar 19 '13 at 17:59
  • @acheong87 I see what you mean now. I updated my answer just so if anyone saw it they wouldn't try to use it. And because it was incorrect. Thanks for your explanation! I don't use regex much and posted before I tested it I guess. – Zack Mar 19 '13 at 19:44
  • Nice. No worries; all here to learn. 'cept Jon Skeet, I mean. – Andrew Cheong Mar 20 '13 at 01:54