3

Is there any way to do completely reverse matching of regex what I use.

(?!...) is working only for simple pattern. I means I have a RegEx to match multiple formats, but I wanted to replace everything in a string except my multi formats.

Say for example: I wrote a complex RegEx pattern to find week days, hours, months, years. Instead of finding these matches and splitting my string using these pattern and joining everything that matches; if there is a inverse matching I could just replace it single shot.

The solution given in How to "inverse match" with regex? is not supporting everything.

Example

hr = """
Monday: 11:30am - 9:30pm Tuesday: 11:30am - 9:30pm
Wednesday: 11:30am - 10:00pm Thursday: 11:30am - 10:00pm 
Friday: 11:30am - 10:30pm Saturday: 11:00am - 10:30pm
(brunch served until 3pm) Sunday: 10:30am - 9:30pm (brunch served until 3pm)
Happy Hour and Special Appetizer menu starting at $3 in the bar. Hours from 4 - 7pm Daily.
$4 BURGER special available on Monday. Wednesday: 1/2 off all bottled wines (4-close)"""


import re

newStr = []
dayPattern = """
   (?:mon|tue|wed|thu|fri|sat|sun|thurs)(?:day)?(?:[.:])*
   \s*
   (?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?) # Start hour
   \s*[-|to]+\s*
  (?:\d{1,2}(?:[:]\d{1,2})?)\s*(?:[ap][.]?m.?) # Close   hour
 """

newStr.extend(\
    re.findall(re.compile(dayPattern, re.VERBOSE|re.IGNORECASE), hr))

print " ".join(newStr)

OUTPUT

Monday: 11:30am - 9:30pm  Thursday: 11:30am - 10:00pm  Friday: 11:30am - 10:30pm  Sunday: 10:30am - 9:30pm

But here I am missing "Monday: 11:30am - 9:30pm Tuesday: 11:30am - 9:30pm Wednesday: 11:30am - 10:00pm Thursday: 11:30am - 10:00pm Friday: 11:30am - 10:30pm".

I could modify my regex to include this pattern too

But instead of doing like this, is there a way I can remove any word except Monday/Tuesday/.... & Mon/Tue/Wed... & 11:00am/12pm...

i.e, exactly I want is this output: Monday: 11:30am - 9:30pm Tuesday: 11:30am - 9:30pm Wednesday: 11:30am - 10:00pm Thursday: 11:30am - 10:00pm Friday: 11:30am - 10:30pm Saturday: 11:00am - 10:30pm Sunday: 10:30am - 9:30pm

Community
  • 1
  • 1
Garfield
  • 2,487
  • 4
  • 31
  • 54
  • may be if you could put your code we can help –  Sep 13 '13 at 13:23
  • Could you include the code you have so far, and example input/output? It is hard to understand what you are trying to do. – amon Sep 13 '13 at 13:23
  • 3
    By chance, do you mean the !~ operator? – Slaven Rezic Sep 13 '13 at 13:43
  • Yes, you are right Slaven Rezic – Garfield Sep 13 '13 at 13:44
  • I believe the you could understand that my expectation is except the days, time durations, spaces and special characters everything else wants to be truncated. – Garfield Sep 13 '13 at 13:45
  • 1
    Extracting and joining the parts you do want seems like a much better approach than finding everything which doesn't match and substituting it with nothing. For one thing, it seems that the backtracking behavior would be quite excessive if the regex has to examine every position in the string separately. Specifying what you do want avoids that. – tripleee Sep 13 '13 at 13:45
  • By the by, adding newlines to your test data would make this a whole lot more readable. – tripleee Sep 13 '13 at 13:47
  • triplee thanks, I will format it. Well the pattern I used to match is just 2 types, I have few more patterns to match it. So, I should keep on adding patterns and do above. Is it bad idea? – Garfield Sep 13 '13 at 13:49
  • In python you should negate regex, there is Afaik no such operator. – mpapec Sep 13 '13 at 14:06

1 Answers1

0

I don't understand your intent of doing a reverse regular expression. findall() seems a natural way of selecting your times, like this:

' '.join(re.findall(r'\w{3,6}day:\s*\d{1,2}:\d{1,2}[ap]m\s*-\s*\d{1,2}:\d{1,2}[ap]m', hr))

It yields:

'Monday: 11:30am - 9:30pm Tuesday: 11:30am - 9:30pm Wednesday: 11:30am - 10:00pm Thursday: 11:30am - 10:00pm Friday: 11:30am - 10:30pm Saturday: 11:00am - 10:30pm Sunday: 10:30am - 9:30pm'
Birei
  • 35,723
  • 2
  • 77
  • 82
  • So, I should optimize my RegEx to match the maximum options, instead of going inverse regEx. – Garfield Sep 13 '13 at 14:55
  • @codelover: With that kind of input, yes, I would stick with this approach. This `regex` is very straightforward and there is room to add some alternations, look-aheads and the like. – Birei Sep 13 '13 at 15:53
  • @Bieri Thanks, Kindly recommend some links for look-aheads – Garfield Sep 13 '13 at 16:25
  • @codelover: Take a look [here](http://www.regular-expressions.info/lookaround.html) or [here](http://www.rexegg.com/regex-lookarounds.html) – Birei Sep 13 '13 at 19:31