0

I have regex that matches URL paths where there is a number in second part of the path.

(*/)([0-9]+)(/*)

However, I want to modify it to only matches paths where the second part is a number and is greater than 3 digits in length/ e.g.

/abcd/12345/abcd – Matched
/abcd/123/abcd – Not matched

Is there a way to specify length limits in regex?

Maven
  • 14,587
  • 42
  • 113
  • 174

2 Answers2

2

A URL consists of URL-encoded ASCII characters which are not control characters (see W3Schools- URI Encoding reference).

The regex to match all printable ASCII characters (no control characters!) is the following (See this SO question):

[ -~]

Therefore assuming you want to match the whole URL you can use the following regex:

^[ -~]*\/\d{4,}\/?[ -~]*$
  • ^: Matches begin of a string
  • [ -~]: Any printable ASCII character
  • *: Match zero or more of the preceding token
  • \/: Slash, must be escaped in RegEx
  • \d: Regex class for digits, matches all digits 0-9
  • {0,4}: Matches 4 or more of the preceding token (at least three numbers)
  • ?: Matches 0 or 1 of the preceding token (there could be a slash at the end or not, both are matched)
  • $: Matches end of a string

const urls = [
  "/abcd/12345/abcd", // Matched
  "/12345/abcd",      // Matched
  "/abcd/123/abcd",   // Not matched - too less digits
  "12345/abcd",       // Not matched - must NOT start with a number (can be adjusted if required)
  "/abcd/12345",      // Matched - may end wiht a number (can be adjusted if required)
  "/abäd/1234"        // Not matched - invalid URL as 'ä' is a non-ASCII character
]

const isValidUrl = (url) => {
  const match = url.match(/^[ -~]*\/\d{4,}\/?[ -~]*$/);
  if(match === null) console.log(`URL ${url} does NOT match.`);
  else console.log(`Match found: ${match[0]}`);
}

urls.forEach(url => isValidUrl(url));
/* StackOverflow snippet: console should overlap rendered HTML area */
.as-console-wrapper { max-height: 100% !important; top: 0; }

It's not 100% clear what exactly you want to match so you might need to adjust the regex to your needs. I suggest you use this RegEx as a starting point and use RegExr to refine it if required.

Mushroomator
  • 6,516
  • 1
  • 10
  • 27
  • Thankyou Your answer addresses the actual question. However, just based on your comments `can be adjusted if required`. Is it possible to have a regex that matches the given condition that is a number greater than 3 in length in anywhere in URL, irrespective of whether its after first second of after nth `/`? – Maven Sep 28 '22 at 00:06
  • Two question: Do you want the match to only contain those numbers and do you want `12345/abdc` to match or only `/12345/abcd`? – Mushroomator Sep 28 '22 at 07:53
  • My end goal is actually to match numbers irrespective of where they appear and to replace them with some other text. For the first part I understand we can mark separate groups in regex using `marked subexpressions ()` but not sure how to dynamically get that which part of the URL is a number, so i can replace it. – Maven Sep 28 '22 at 13:29
1

You may use this pattern:

 ^/[^/]+/\d{4,}

The \d{4,} ending portion of the regex matches only on 4 digits or more. Here is a demo.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360