0

I have a string that follows this url pattern as

https://www.examples.org/activity-group/8712/activity/202803
// note :  the end ending of the url can be different
https://www.examples.org/activity-group/8712/activity/202803‌​?ref=bla
https://www.examples.org/activity-group/8712/activity/202803‌​/something

I'm trying to write a regex that matches

https://www.examples.org/activity-group/{number}/activity/{number}*

Where {number} is an integer of length 1 to 10.

How to define a regex that checks the string pattern and checks if the number is at the right position in the string ?

Background: in Google form, in order validate an answer , I want to enforce people to enter an url in this format. Hence the use of this regular expression.

For Urls not matching that format, the regex should return false. For example : https://www.notthesite.org/group/8712/activity/astring

I went through several examples, but they match only if the number is present in the string.

Examples sources :

Raymond Chenon
  • 11,482
  • 15
  • 77
  • 110

3 Answers3

2

^https:\/\/www\.examples\.org\/activity-group\/[0-9]{1,10}\/activity\/[0-9]{1,10}(\/[a-z]+)*((\?[a-z]+=[a-zA-Z0-9]+)(\&[a-z]+=[a-zA-Z0-9]+)*)*$

  • ^ - start of string
  • \ - escape character
  • [0-9] - a digit
  • {1,10} - between one and ten of the previous items
  • (\/[a-z]+)* - Allow additional URL segments
  • ((\?[a-z]+=[a-zA-Z0-9]+)(\&[a-z]+=[a-zA-Z0-9]+)*)* - Allow query parameters with first parameter using a ? and all others using &
  • $ - end of string

This is assuming the URL segment and query parameter keys are lowercase letters only. The query parameter values can be lowercase letters, uppercase letters, or digits.

Alec Fenichel
  • 1,257
  • 1
  • 13
  • 27
  • Thanks Alec, I forgot to say something , the ending of the url can be different `https://www.examples.org/activity-group/8712/activity/202803?ref=bla` or `https://www.examples.org/activity-group/8712/activity/202803/something` . How to match ? – Raymond Chenon Nov 05 '17 at 20:12
  • @RaymondChenon Edited – Alec Fenichel Nov 05 '17 at 20:20
  • Thanks again. I tried [here][https://regex101.com/r/QINDkA/1/] , the one with `activity/202803‌​?ref=bla` matches . But the other don't : `activity/202803‌​` and `activity/202803‌​/something` . – Raymond Chenon Nov 05 '17 at 20:32
  • @RaymondChenon I missed the last `*`, fixed. – Alec Fenichel Nov 05 '17 at 20:33
1

You could use

https?:\/\/(?:[^/]+\/){2}(\d+)\/[^/]+\/(\d+)

See a demo on regex101.com.


Broken down, this says:
https?:\/\/     # http:// or https://
(?:[^/]+\/){2}  # not "/", followed by "/", twice
(\d+)           # 1+ digits
\/[^/]+\/       # same pattern as above
(\d+)           # the other number

You'll need to use group 1 and 2, respectively.


If this is too permissive, use
https:\/\/[^/]+\/activity-group\/(\d+)\/activity\/(\d+)

Which reads

https:\/\/[^/]+     # https:// + some domain name
\/activity-group\/  # /activity-group/
(\d+)               # first number
\/activity\/        # /activity/
(\d+)               # second number

See another demo on regex101.com.

Jan
  • 42,290
  • 8
  • 54
  • 79
  • 1
    You solution is a gem of wisdom. I thought it was too permissive, I hacked a bit. My final solution is `https:\/\/www\.examples\.org\/activity-group\/(\d+)\/activity\/(\d+)` – Raymond Chenon Nov 05 '17 at 20:41
  • 1
    @RaymondChenon: Glad to be of any help. – Jan Nov 05 '17 at 20:42
0

Probably you need something like:

(http[s]?:\/\/)?www.examples.org\/activity-group\/(\d{1,10})\/activity\/(\d{1,10})([\S]+?)$

Where:

  • (http[s]?:\/\/)? matches any http:// or https:// part.
  • www.examples.org is your domain name.
  • (\d{1,10}) will match the first integer with max len of 10(after activity-group).
  • Second (\d{1,10}) will match the second integer after activity.
  • And finally ([\S]+?)$ will match any optional data after the second number until a new line is found, assuming that you use multiline flag with \m.

Check it at http://regexr.com/3h448

Hope it helps!

codeadict
  • 2,643
  • 1
  • 15
  • 11