Regex to only catch numbers which are greater than n digits in length

Question

I have regex that matches URL paths where there is a number in second part of the path.

(*/)([0-9]+)(/*)

However, I want to modify it to only matches paths where the second part is a number and is greater than 3 digits in length/ e.g.

/abcd/12345/abcd – Matched
/abcd/123/abcd – Not matched

Is there a way to specify length limits in regex?

https://stackoverflow.com/a/17120435/5527985 – bobble bubble Sep 27 '22 at 23:13 — bobble bubble, Sep 27 '22 at 23:13

Mushroomator · Accepted Answer · 2022-09-27T23:37:59.960

A URL consists of URL-encoded ASCII characters which are not control characters (see W3Schools- URI Encoding reference).

The regex to match all printable ASCII characters (no control characters!) is the following (See this SO question):

[ -~]

Therefore assuming you want to match the whole URL you can use the following regex:

^[ -~]*\/\d{4,}\/?[ -~]*$

^: Matches begin of a string
[ -~]: Any printable ASCII character
*: Match zero or more of the preceding token
\/: Slash, must be escaped in RegEx
\d: Regex class for digits, matches all digits 0-9
{0,4}: Matches 4 or more of the preceding token (at least three numbers)
?: Matches 0 or 1 of the preceding token (there could be a slash at the end or not, both are matched)
$: Matches end of a string

const urls = [
  "/abcd/12345/abcd", // Matched
  "/12345/abcd",      // Matched
  "/abcd/123/abcd",   // Not matched - too less digits
  "12345/abcd",       // Not matched - must NOT start with a number (can be adjusted if required)
  "/abcd/12345",      // Matched - may end wiht a number (can be adjusted if required)
  "/abäd/1234"        // Not matched - invalid URL as 'ä' is a non-ASCII character
]

const isValidUrl = (url) => {
  const match = url.match(/^[ -~]*\/\d{4,}\/?[ -~]*$/);
  if(match === null) console.log(`URL ${url} does NOT match.`);
  else console.log(`Match found: ${match[0]}`);
}

urls.forEach(url => isValidUrl(url));

/* StackOverflow snippet: console should overlap rendered HTML area */
.as-console-wrapper { max-height: 100% !important; top: 0; }

It's not 100% clear what exactly you want to match so you might need to adjust the regex to your needs. I suggest you use this RegEx as a starting point and use RegExr to refine it if required.

Thankyou Your answer addresses the actual question. However, just based on your comments `can be adjusted if required`. Is it possible to have a regex that matches the given condition that is a number greater than 3 in length in anywhere in URL, irrespective of whether its after first second of after nth `/`? — Maven, Sep 28 '22 at 00:06
Two question: Do you want the match to only contain those numbers and do you want `12345/abdc` to match or only `/12345/abcd`? — Mushroomator, Sep 28 '22 at 07:53
My end goal is actually to match numbers irrespective of where they appear and to replace them with some other text. For the first part I understand we can mark separate groups in regex using `marked subexpressions ()` but not sure how to dynamically get that which part of the URL is a number, so i can replace it. — Maven, Sep 28 '22 at 13:29

score 1 · Answer 2 · answered Sep 27 '22 at 23:15

1

You may use this pattern:

 ^/[^/]+/\d{4,}

The \d{4,} ending portion of the regex matches only on 4 digits or more. Here is a demo.

answered Sep 27 '22 at 23:15

Tim Biegeleisen

502,043
27
286
360

Regex to only catch numbers which are greater than n digits in length

2 Answers2