0

I am using this pattern to extract confirmation dates from a text file and converting them to a date object (see my post here Extract/convert date from string in MS Access).

The current pattern matches all strings that look like a date, but may not be the confirmation date (which is always preceded by Confirmed by), and moreover, may not have complete date information (e.g. no AM or PM).

 Pattern: (\d+/\d+/\d+\s+\d+:\d+:\d+\s+\w+|\d+-\w+-\d+\s+\d+:\d+:\d+)

Sample text:

WHEN COMPARED WITH RESULT OF 7/13/12 09:06:42  NO SIGNIFICANT 
CHANGE; Confirmed by SMITH, MD, JOHN (2242) on 7/14/2012 3:46:21 PM;

The above pattern matches the following:

WHEN COMPARED WITH RESULT OF 7/13/12 09:06:42  NO SIGNIFICANT 
                             ^^^^^^^^^^^^^^^^^^^^
CHANGE; Confirmed by SMITH, MD, JOHN (2242) on 7/14/2012 3:46:21 PM;
                                               ^^^^^^^^^^^^^^^^^^^^

I want the pattern to look for the date in the segment of the text file that begins with Confirmed by and ends with a semi-colon. Also, in order to properly convert the time, the pattern should match only AM or PM at the end. How can I restrict the pattern to this segment and add the additional AM or PM criteria?

Can anyone help?

Community
  • 1
  • 1
regulus
  • 939
  • 3
  • 13
  • 21

4 Answers4

2

In order to match the end of the string, use $ at the end of your regex. To match the entire phrase "Confirmed by <someone> on <date>", use plain text (remember that plain text can be used in a regex as well -- if you aren't using special characters, the matcher will match your query verbatim). You need to use a negative look-ahead to exclude entire words.So maybe something like this:

Confirmed by (?!\ on\ )(\d+/\d+/\d+\s+\d+:\d+:\d+\s+\w+|\d+-\w+-\d+\s+\d+:\d+:\d+)$

Which will allow you to match a string that starts with "Confirmed by", followed by anything except for " on ", followed by the date that you capture, and the end of the string.

Edit: the negative look-ahead part is tricky, look at the answer below for more reference:

A regular expression to exclude a word/string

Community
  • 1
  • 1
maxko87
  • 2,892
  • 4
  • 28
  • 43
  • I tried to use this pattern with the source text using [GSKinner's Reg Exr tool](http://gskinner.com/RegExr/?), but it doesn't seem to capture the date. For the date pattern, the pattern mentioned below (\d+/\d+/\d+\s+\d+:\d+:\d+\s+(?:AM|PM)); works nicely. But still can't get the negative look-ahead to work. – regulus Jul 27 '12 at 21:58
  • I changed the quotes within the negative lookahead to escaped spaces and deleted the square brackets, does that help? – maxko87 Jul 27 '12 at 22:04
1

I don't see any need for a lookahead here, positive or negative. This works correctly on your sample string:

Confirmed by [^;]*(\d+/\d+/\d+\s+\d+:\d+:\d+(?:\s+(?:AM|PM))?|\d+-\w+-\d+\s+\d+:\d+:\d+);

The [^;]* effectively corrals the match between a Confirmed by sequence and its closing semicolon. (I'm assuming the semicolon will always be present.)

+(?:\s+(?:AM|PM))? makes the AM/PM optional, along with its leading whitespace.

The actual date will be stored in capturing group #1.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
0

Try this:

(\d+/\d+/\d+\s+\d+:\d+:\d+\s+(?:AM|PM));
nickb
  • 59,313
  • 13
  • 108
  • 143
Frobzig
  • 324
  • 1
  • 10
  • This will (nicely!) match all dates in the source text. Any ideas how I can restrict to the segment that begins with 'Confirmed by' and ends with a ";"? – regulus Jul 27 '12 at 21:42
0

The simplest answer is more than often a good enough solution. By turning of the default greedy behavior (using the question mark: .*?) the regular expression will instead try to find the shortest match that matches the pattern. A pattern never matches the same string more than once, this means that each Confirmed by can only be coupled with one date which in this case is the next to follow.

Confirmed by.*?(\d+/\d+/\d+\s+\d+:\d+:\d+\s+(?:AM|PM));
alaeus
  • 171
  • 6