0

I have some URLs like:

dir-1
dir-1/dir-2
dir-1/dir-2/dir-3
dir-1/dir-2/dir-3/dir-[n]..and so on..

My current regex (with PHP) looks like this:

/^([[:lower:][:digit:]\-\/]+)$/

So the regex matches all URLs. But in my case, I need only the first and second version, so that there is NONE or only one occurrence of a slash.

I tried multiple times to figure out the right way, but with no result.

tchrist
  • 78,834
  • 30
  • 123
  • 180
marvin
  • 105
  • 1
  • 6
  • 1
    A little more information as to why you need to specifically use regex would be helpful. Thus far I'm thinking the explode answer is the best unless there is a specific reason you -must- use regex. – Blake Mar 06 '12 at 20:45
  • I currently have about 20.000 URLs (and in the future many, many more) which have combinations of the first and second version, followed by static directory levels. Exmaples: /lastname-1/dates/ /lastname-2/dates/ /lastname-1/firstname-1/dates/ and so on.. I try to figure out the right to handle this, but I am still in development :) But the first answer from Amber is currently the right. – marvin Mar 06 '12 at 21:17
  • 1
    Are you mining this from a static text source, or is it something that you generate? If it's something that you generate in php, I still like the explode answer below of @Lix. I'm just trying to get an idea of why you specifically need regex (since in most cases it can (and should) be avoided). – Blake Mar 06 '12 at 21:22
  • The URLs were generated by data stored in a database. The database grows as the content grows. The URLs were in some way static and in some way dynamic. I think I will try the explode-method, combined with switches and if-conditions to avoid regex. Fortunately I am in the position to try another method before it goes in production :) – marvin Mar 06 '12 at 22:10
  • Marv - If you decided to use my solution in the end you might want to consider changing your accepted answer :) – Lix Mar 08 '12 at 09:34

2 Answers2

2

Just match that set of characters (minus a /), then an optional /, then that set of characters again (optionally).

/^([[:lower:][:digit:]-]+\/?[[:lower:][:digit:]-]*)$/
Amber
  • 507,862
  • 82
  • 626
  • 550
2

You could just explode the string using the slash as the delimiter -

$str = "dir-1/dir-2";    
$splitArr = explode('/',$str);
  • If the resulting array has more than one element then a slash is present.
  • More than two elements == more than one slash!
array (
  0 => 'dir-1',
  1 => 'dir-2',
)  

As your question states you are not talking about any old string variable buy specifically URL's, make sure to remove the http or https protocol from the beginning of the string :

Eg: https://marvin.com/dir-1/dir-2

You could possibly use the $_SERVER[REQUEST_URI] variable - this returns the current URL relative to the sites DOCUMENT ROOT.

So the $_SERVER[REQUEST_URI] of https://marvin.com/dir-1/dir-2 would be :

/dir-1/dir-2 (look familiar?)

After the explode() you could also use array_shift() to remove the initial root slash and
HEY PRESTO! you have avoided using regular expressions!

Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.

Community
  • 1
  • 1
Lix
  • 47,311
  • 12
  • 103
  • 131