1

Here is my regex:

/On.* \d{1,2}\/\d{1,2}\/\d{1,4} \d{1,2}:\d{1,2} (?:AM|PM),.*wrote:/

to match:

On 3/14/11 2:55 PM, XXXXX XXXXXX wrote:

I need this Regex to also match:

On 25/03/2011, at 2:19 AM, XXXXX XXXXXXXX wrote:

So I tried this:

/On.* \d{1,2}\/\d{1,2}\/\d{1,4}(, at)? \d{1,2}:\d{1,2} (?:AM|PM),.*wrote:/

But that breaks the other matches

Am I making the (, at)? optional set right?

Thanks

AnApprentice
  • 108,152
  • 195
  • 629
  • 1,012
  • What is the `.*` after `On` for? You are not consistent about cathces. You are excluding `(?:AM|PM)` from the catch, but are not excluding `(, at)`, which seems less important. Is there any reason? – sawa Mar 28 '11 at 01:49
  • 2
    You keep asking questions about the same problem: http://stackoverflow.com/q/5239883/128421, http://stackoverflow.com/q/5130733/128421 – the Tin Man Mar 28 '11 at 02:08
  • 1
    What “other matches” does your second regex “break”? If you want help devising a regexp that will match other strings, then you will need to tell use about those other strings. – Chris Johnsen Mar 28 '11 at 02:38

3 Answers3

1

I changed you Regex just slightly, and I am able to match both strings. The regex I have is:

/On.* \d{1,2}\/\d{1,2}\/\d{1,4}(?:, at)? \d{1,2}:\d{1,2} (?:AM|PM),.*wrote:/ 

Comparing the results of the two:

irb(main):023:0> s1 = "On 25/03/2011, at 2:19 AM, XXXXX XXXXXXXX wrote:"
=> "On 25/03/2011, at 2:19 AM, XXXXX XXXXXXXX wrote:"
irb(main):024:0> s2 = "On 3/14/11 2:55 PM, XXXXX XXXXXX wrote:"
=> "On 3/14/11 2:55 PM, XXXXX XXXXXX wrote:"
#Your previous Regex
irb(main):025:0> m = /On.* \d{1,2}\/\d{1,2}\/\d{1,4}(, at)? \d{1,2}:\d{1,2}(?:AM|PM),.*wrote:/
=> /On.* \d{1,2}\/\d{1,2}\/\d{1,4}(?:, at) \d{1,2}:\d{1,2} (?:AM|PM),.*wrote:/
irb(main):026:0> s1.match(m)
=> #<MatchData "On 25/03/2011, at 2:19 AM, XXXXX XXXXXXXX wrote">
irb(main):027:0> s2.match(m)
=> nil

#The updated Regex
irb(main):028:0> m = /On.* \d{1,2}\/\d{1,2}\/\d{1,4}(?:, at)? \d{1,2}:\d{1,2} (?:AM|PM),.*wrote/
=> /On.* \d{1,2}\/\d{1,2}\/\d{1,4}(?:, at)? \d{1,2}:\d{1,2} (?:AM|PM),.*wrote/
irb(main):029:0> s1.match(m)
=> #<MatchData "On 25/03/2011, at 2:19 AM, XXXXX XXXXXXXX wrote">
irb(main):030:0> s2.match(m)
=> #<MatchData "On 3/14/11 2:55 PM, XXXXX XXXXXX wrote">
Sean Hill
  • 14,978
  • 2
  • 50
  • 56
0

The following regex works for both cases:

On\s*\d{1,2}\/\d{1,2}\/\d{1,4}[\s,]*(at)?\s*\d{1,2}:\d{1,2}\s*(?:AM|PM),\s*.*wrote:
pcofre
  • 3,976
  • 18
  • 27
0

The problem with other input strings may be caused by the .* idiom. It's greedy and want to consume as much as it can from the input.

If your input e.g. is a date, followed by some random text, and then another date -- then your regex will think that the two dates and the random text is one single date. Most of it will be consumed by .*.

In most cases it's better to use a lazy quantifier. Syntactically you write .*? instead of .*. You have two .*. Try to replace both with .*?

/On.*? \d{1,2}\/\d{1,2}\/\d{1,4}(, at)? \d{1,2}:\d{1,2} (?:AM|PM),.*?wrote:/

If that doesn't work, you'll have to post the failing dates here and you will most certainly get more feedback from this community.

Staffan Nöteberg
  • 4,095
  • 1
  • 19
  • 17