-1

I am useless at Regex and I want to remove parts of a URL that are not always consistent.

The URL might be:

www.test.com /en/ restOfPath

or

www.test.com /en/en_gb/ restOfPath

Then depending on the country values might change to:

www.test.com /es/ restOfPath

or

www.test.com /es/es_es/ restOfPath

I am therefore looking to alway remove, the parts in bold, so that I can split the remained of the path, to create a logical naming that is language/location agnostic.

I am doing this as a work around to build out a data layer until the client can implement it properly when they launch their new website. I have managed to build an if else statement as a workaround which is a bit clunky but would like a cleaner solution.

Roman Rock
  • 61
  • 1
  • 5
  • Generally we want to help people who've been working on a solution to solve a problem. Have you tried a regex solution for this. If not maybe you should do some regex tutorials? – Alex Collins Sep 12 '17 at 14:38
  • Possible duplicate of [How do I parse a URL into hostname and path in javascript?](https://stackoverflow.com/questions/736513/how-do-i-parse-a-url-into-hostname-and-path-in-javascript) – Tim Biegeleisen Sep 12 '17 at 14:39
  • I'm not a JavaScript guru, but if you follow the link above you'll see that there are already some libraries out there which can help you to parse a URL/URI. I'd start by using those as much as possible, and only afterwards resort to using a regex. – Tim Biegeleisen Sep 12 '17 at 14:40
  • 1
    i used to be useless at Regex as well. What helped me was experimenting with my problems on http://regexr.com/ until I found a solution that fit. Now I am not completely useless anymore. – ivospijker Sep 12 '17 at 14:40
  • You have to get and use list of all those language abbreviations, otherwise regex doesn't have them. `lan1(?:_X1)?|lan2(?:_X2)?|lan3(?:_X3)?| ..` , etc.. –  Sep 12 '17 at 16:17
  • Thank you for the responses. @alex sorry if I was not clear on why I needed help, I was writing a temporary work around in a tech spec for a client to build a page name for analytics. The answer above does not solve my question and I have tried the two regex below but they don't seem to do it either. I built an if else statement as a work around, but it is a bit clumsy. – Roman Rock Sep 12 '17 at 19:15
  • @RomanRock the regex I provided definitely matches the text in bold in your examples. See the example i provided in the link. What exactly is it you can't get working? – DNKROZ Sep 13 '17 at 13:51

2 Answers2

2

Probably this will help you

(?:\/([a-z]{2})(?:\/([a-z]{2}_[A-Z]{2}))?)

This example is about to find first / with two alpha after that, and probably another / with aa_AA construction.

I got you code samples at regex101

Andrew Rumm
  • 1,268
  • 4
  • 16
  • 39
1

I believe this is what you're after:

\/.*(?=\/.*?)

https://regex101.com/r/OZIseI/4

It uses a positive look ahead to exclude the last / from the match

enter image description here

DNKROZ
  • 2,634
  • 4
  • 25
  • 43