1

I am trying to create a single regular expression that I can use to extract the number from two different urls in a PHP function. The format of these urls are:

/t/2121/title/

and

/top2121.html

I am bad at regular expressions and have already tried the following and many variants of it:

#^/t/(\d+?)/|/top(\d+?)\.html/#i

This is not doing anything and I am still at a complete loss after reading many sites and tutorials on regular expressions. Is there a regular expression I could create that would allow me to extra the number regardless of the url format entered?

John Tangale
  • 325
  • 2
  • 17

5 Answers5

1

if you just want the first digits after t regardless of the / between, something like this might work: #t/?(\d+)#i

edit:

example: http://codepad.viper-7.com/0z3ee0

Jonathan Kuhn
  • 15,279
  • 3
  • 32
  • 43
1

Regex to extract only the digits while also checking if url matches accepted formats:

#^\/t(?:\/(\d+)\/[a-z_-]+\/?|op(\d+)\.html)$#i edit: captures in 2 groups

Explained demo here: http://regex101.com/r/dO5dI4

Variant #2: captures in the same group

#^\/t(?|\/(\d+)\/[a-z_-]+\/?$|op(\d+)\.html$)#i

Explained demo here: http://regex101.com/r/cG9vC3

CSᵠ
  • 10,049
  • 9
  • 41
  • 64
  • Thanks. /blabla21051.html should not work though. It is not a valid url structure as per the questions. Also is there a reason why we limit the digits to 4 or 5? I would prefer not to limit them at all. This forum already has over a couple 100k topics. – John Tangale Feb 06 '13 at 22:10
  • 1
    Great that is exactly what I needed! Also thanks for the site. Excellent place to learn and test regexp! – John Tangale Feb 06 '13 at 22:24
  • I just realized that the above regexp will have the matches done in different indexes. It will be the first match in /t/1231/asdas/ and the second in /top1231.html. Is there any way to rework it so that it would be the same index regardless of the pattern matched? – John Tangale Feb 06 '13 at 22:37
0

I was able to get this regexp to match both types of url formats:

#^/(?:(?:t/)|(?:top))(\d+)(?:(?:\.html)|(?:/))#i

If anyone has a more efficient way of performing the same regexp, I would love to hear it.

John Tangale
  • 325
  • 2
  • 17
0

If you got either one of these URL's you could use this expression. Your numbers should be stored in your second position:

#^/t(op|/)(\d+)(\.html|/.*)#i
Bjørne Malmanger
  • 1,457
  • 10
  • 11
0

Are there ever going to be numbers in the URL that you don't care about? If not, you can keep this simple by just capturing the numbers and ignoring the rest:

#(\d+)#
girasquid
  • 15,121
  • 2
  • 48
  • 58