Single regular expression that extracts a number from two different url formats?

Question

I am trying to create a single regular expression that I can use to extract the number from two different urls in a PHP function. The format of these urls are:

/t/2121/title/

and

/top2121.html

I am bad at regular expressions and have already tried the following and many variants of it:

#^/t/(\d+?)/|/top(\d+?)\.html/#i

This is not doing anything and I am still at a complete loss after reading many sites and tutorials on regular expressions. Is there a regular expression I could create that would allow me to extra the number regardless of the url format entered?

Looks like here's your answer: http://stackoverflow.com/questions/6604455/php-code-to-remove-everything-but-numbers — winkbrace, Feb 06 '13 at 19:31
I apologize. I entered the second url format incorrectly. I edited the question to include the proper format that I am working with. — John Tangale, Feb 06 '13 at 19:51
I suggest that you NOT do it as one regular expression unless you have a very specific reason why you must. Keep the two tasks separate for clarity. — Andy Lester, Feb 06 '13 at 22:38

score 1 · Answer 1 · answered Feb 06 '13 at 19:33

1

if you just want the first digits after t regardless of the / between, something like this might work: #t/?(\d+)#i

edit:

example: http://codepad.viper-7.com/0z3ee0

answered Feb 06 '13 at 19:33

Jonathan Kuhn

15,279
3
32
43

I messed up when I wrote the question. The second url is in the format /top2121.html – John Tangale Feb 06 '13 at 19:51
well if all you want are the numbers, and those are the only numbers in the uri, then just regex for `/\d+/` – Jonathan Kuhn Feb 06 '13 at 20:07
As this can be user submitted, I want to match the numbers but also make sure the submitted url is valid. – John Tangale Feb 06 '13 at 20:09

CSᵠ · Accepted Answer · 2013-02-06T22:47:52.000

1

Regex to extract only the digits while also checking if url matches accepted formats:

#^\/t(?:\/(\d+)\/[a-z_-]+\/?|op(\d+)\.html)$#i edit: captures in 2 groups

Explained demo here: http://regex101.com/r/dO5dI4

Variant #2: captures in the same group

#^\/t(?|\/(\d+)\/[a-z_-]+\/?$|op(\d+)\.html$)#i

Explained demo here: http://regex101.com/r/cG9vC3

edited Feb 06 '13 at 22:47

answered Feb 06 '13 at 21:30

CSᵠ

10,049
9
41
64

Thanks. /blabla21051.html should not work though. It is not a valid url structure as per the questions. Also is there a reason why we limit the digits to 4 or 5? I would prefer not to limit them at all. This forum already has over a couple 100k topics. – John Tangale Feb 06 '13 at 22:10
1

Great that is exactly what I needed! Also thanks for the site. Excellent place to learn and test regexp! – John Tangale Feb 06 '13 at 22:24
I just realized that the above regexp will have the matches done in different indexes. It will be the first match in /t/1231/asdas/ and the second in /top1231.html. Is there any way to rework it so that it would be the same index regardless of the pattern matched? – John Tangale Feb 06 '13 at 22:37

score 0 · Answer 3 · answered Feb 06 '13 at 20:55

0

I was able to get this regexp to match both types of url formats:

#^/(?:(?:t/)|(?:top))(\d+)(?:(?:\.html)|(?:/))#i

If anyone has a more efficient way of performing the same regexp, I would love to hear it.

answered Feb 06 '13 at 20:55

John Tangale

325
2
17

score 0 · Answer 4 · answered Feb 06 '13 at 20:57

0

If you got either one of these URL's you could use this expression. Your numbers should be stored in your second position:

#^/t(op|/)(\d+)(\.html|/.*)#i

answered Feb 06 '13 at 20:57

Bjørne Malmanger

1,457
10
11

score 0 · Answer 5 · answered Feb 06 '13 at 21:03

0

Are there ever going to be numbers in the URL that you don't care about? If not, you can keep this simple by just capturing the numbers and ignoring the rest:

#(\d+)#

answered Feb 06 '13 at 21:03

girasquid

15,121
2
48
58

Single regular expression that extracts a number from two different url formats?

5 Answers5