0

Suppose I have the following string, representing one or many days or day ranges:

mon,thu..fri,sun

How can I match any arbitrary list of ranges or single days with a regular expression, without expanding the day alternatives twice?

I currently have this:

(?P<weekdays>
    (                        
        \b
        (mon|tue|wed|thu|fri|sat|sun)
        (\.\.(mon|tue|wed|thu|fri|sat|sun))?
        ,?
    )*
)

... this works, but it forces me to repeat the day alternatives in the regex (which are simplified here but are longer!). Note that this regex matches for fri,sat, thus optionally ending in a comma, this IS the desired behavior.

I also tried making the range portion a limited repetition using {1,2}, but I am unable to avoid matching the invalid mon..tue..fri because the pattern restarts via the optional comma.

Note that this is part of a longer regex so I can't use the global flag.

This is the Regex101 URL, where I also added some unit tests.

Small edit: used the \b metacharacter instead of a negative lookahead.

istepaniuk
  • 4,016
  • 2
  • 32
  • 60

3 Answers3

1

you can try this

/(\bmon\b|\btue\b|\bwed\b|\bthu\b|\bfri\b|\bsat\b|\bsun\b)/g
  • 1
    Thanks. Could you perhaps elaborate a bit more? Try that how? The boundary metacharacter could be an interesting thing to apply here, but I don't see how your regex relates to my question (despite being days) – istepaniuk Dec 31 '20 at 17:23
1

You can use PCRE named group and reuse a sub-pattern later using (?&groupName) construct:

^(?<weekdays>
    (                        
        \b
        (?<weeks>mon|tue|wed|thu|fri|sat|sun)
        (?:\.\.(?&weeks))?
        ,?
    )+
)$

RegEx Demo


To keep definition separate from reference, use DEFINE directive of PCRE:

(?(DEFINE)
   (?<weeks>mon|tue|wed|thu|fri|sat|sun)
)
^(?P<weekdays>
    (?:                        
        \b
        (?&weeks)
        (?:\.\.(?&weeks))?
        ,?
    )*
)$

RegEx Demo 2

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • 1
    Nice, I had no idea you could just reference whole patterns. Note that the original negative lookahead wasn't necessary, apparently `\b` does the trick (to avoid `tuetue`). On your added part, you also used `((?&weeks))`, I guess you mean just `(?&weeks)` – istepaniuk Dec 31 '20 at 17:38
  • 1
    Indeed `(?&weeks)` is sufficient. I just mimicked your original regex that had 5 captured groups. I didn't attempt to clean/refactor regex. Will do it now – anubhava Dec 31 '20 at 17:40
  • @ikegami: Incidentally, that was my originally posted answer with [demo link](https://regex101.com/r/8V2FG6/3) then I thought of reducing overall length :) – anubhava Dec 31 '20 at 18:03
  • 2
    Both are useful! I am not sure why the question was flagged duplicate, the solution could be similar, but I am not necessarily asking about reusing patterns, and I imagine there are ways to do this without referencing the pattern. (btw, this answer is much better and complete than the one in the "original") – istepaniuk Dec 31 '20 at 18:10
  • @istepaniuk: Yes agreed, linked answer is not really a dupe. – anubhava Dec 31 '20 at 18:12
0

Note that you don't even need a reference to a capture group or a capture group at all for what you are trying to do. Playing with a word boundary suffices:

\A \b
(?:
    (?: , |  \.\. )?                
    (?: mon|tue|wed|thu|fri|sat|sun) \b
)++
\z

demo

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • Interesting, but this would match on "mon..tue..fri", right? You can run the unit tests (on the link you provided) – istepaniuk Jan 02 '21 at 00:56
  • We can try this: ```^(?P (?: (?:,|(?<!\....)\.\.|^) (?:mon|tue|wed|thu|fri|sat|sun) )+ )$``` – Michail Jan 02 '21 at 02:28