1

Input:

http://foo/bar/baz/../../qux/

Desired Output:

http://foo/qux/

This can be achieved using regular expression (unless someone can suggest a more efficient alternative).

If it was a forward look-up, it would be as simple as:

/\.\.\/[^\/]+/

Though I am not familiar with with how to make a backward look up for the first "/" (ie. not doing /[a-z0-9-_]+\/\.\./).

One of the solutions I thought of is to use strrev then apply forward look up regex (first example) and then do strrev. Though I am sure there is a more efficient way.

Gajus
  • 69,002
  • 70
  • 275
  • 438

2 Answers2

0

Not the clearest question I've ever seen, but if I understand what you're asking, I think you only need to switch around what you have like this:

/[^\/]+/\.\./

...then replace that with a /

Do that until no replacements are made and you should have what you want

EDIT

Your attempt seems to try to match a forward slash / and two dots \.\. followed by a slash / (or \/ - they should both match the same thing), then one or more non-slash characters[^/]+, terminated by a slash /. Flipping it around, you want to find a slash followed by one or more non-slash characters and a terminating slash, then two dots and a final slash.

You may be confused into thinking that the regex engine parses and consumes things as it goes (so you wouldn't want to consume a directory name that is not followed by the correct number of dots), but that's not how it typically works - a regex engine matches the entire expression before it replaces or returns anything. So, you can have two dots followed by a directory name, or a directory name followed by two dots - it doesn't make a difference to the engine.

If your attempt is using the slash-enclosed Perl-style syntax, then you would of course need to use \/ for any slashes you're trying to match such as the middle one, but I would also recommend matching and replacing the enclosing slashes in the url as well: I think the PHP would be something like

preg_replace('/\/[^\/]+\/\.\.\//', '/', $input)

(??)

Code Jockey
  • 6,611
  • 6
  • 33
  • 45
  • Doesn't work on any legal path without trailing slash: `/aaa/bbb/ccc/..`. Doesn't work on path like this: `/aaa/bbb/ccc/ddd/../../../../` – NoSkill Sep 10 '22 at 23:22
0

Technically what do you want is replace segments of '/path1/path2/../../' by '/' what is needed to do that is match 'pathx/'^n'../'^n that is definetly NOT a regular expression (Context Free Lenguaje) ... but most of Regex libraries supports some non regular lenguajes and can (with a lot of effort) manage those kind of lenguajes.

An easy way to solve it is stay in Regular Expressions and cycle several times, replacing '/[^./]+/../' by ''

if you still to do it in a single step, Lookahead and grouping is needed, but it will be hard to write it, (I'm not so used on, but I will try)

EDIT:

I've found the solution in only 1 REGEX... but should use PCRE Regex

([^/.]+/(?1)?\.\./)

I've based my solution on the folowing link: Match a^n b^n c^n (e.g. "aaabbbccc") using regular expressions (PCRE)

(note that dots are "forbidden" in the first section, you cannot have path.1/path.2/ if you whant to is quite more complex because you should admit them but forbid '../' as valid in the first section

this sub expression is for admiting the path names like 'path1/'

[^/.]+/

this sub expression is for admiting the double dots.

\.\./

you can test the regexp in https://www.debuggex.com/ (remember to set it in PCRE mode)

Here is a working copy: https://eval.in/52675

Community
  • 1
  • 1
Qsebas
  • 458
  • 3
  • 15
  • Doesn't work on any legal path without trailing slash: `/aaa/bbb/ccc/..`. Doesn't work on path like this: `/aaa/bbb/..ccc/../` – NoSkill Sep 10 '22 at 23:26