0

I know that a similar question to this has been asked before but I couldn't get that solution to work. It's this one

Regular expression to match a line that doesn't contain a word?

Here's the text

     ID   Type    Code    Test Name                  Dept    Date --- Time --- By
 ---- ---- ---------- ------------------------- ------ -------- --------

 01     S  10231AB=,+ Test1 With Spaces       20180913  1:08 AM ENIG01
 02     S  %SBTEX1    Test2 With Spaces       20180912 10:02 AM MYR001
 03     B  6399AB=    Test3 With Spaces       20180912 12:07 AM WDHLSY1
 04     S  4848AB=,4+ Test4 With Spaces       20180912 12:07 AM WDHLSY1
 05     S  899AB=,+   TSH+                    20180913  1:08 AM ENIG01
 06     S  899AB=,+   TSH+  

Lines 1 and 2 are not a match because the contain the text "10231" and "%SBTEX1".

Line 5 is the match.

Line 6 is not a match because it does not have a string of digits such as "20180913" followed by the date and time.

I tried but could not even come up with a regular expression that matched all of the lines of code except for line 6.

Here's the Regex that is in the post mentioned above. It excludes a line of code that contains a word.

^((?!hede).)*$

The Question:

A big shout out to Wiktor Stribiżew who solved my original question. But I had omitted some text and when I tried to implement his solution, I realized the problem was more complicated than I had initially thought.

If you would like to see his solution to the original question, please visit the link below.

Wiktor's Solution To The Original Question

Wiktor if you could. Please post your solution on RegexStorm.Net/Tester again, that was amazing!

Thank you,

Mark S.

Mark
  • 178
  • 1
  • 2
  • 16
  • Try [`(?m)(?>^[\t\p{Zs}]*\d+\s+\w\s+\S+)(?<!\s\S*(?<!\d)(?:10231|%SBTEX1)(?!\d)\S*)`](http://regexstorm.net/tester?p=%28%3fm%29%28%3f%3e%5e%5b%5ct%5cp%7bZs%7d%5d*%5cd%2b%5cs%2b%5cw%5cs%2b%5cS%2b%29%28%3f%3c!%5cs%5cS*%28%3f%3c!%5cd%29%28%3f%3a10231%7c%25SBTEX1%29%28%3f!%5cd%29%5cS*%29&i=+01+++++S++10231AB%3d%2c%2b+Test1%0d%0a+02+++++S++%25SBTEX1++++Test2+With+Spaces+++++++20180912+10%3a02+AM+MYR001%0d%0a+03+++++B++6399AB%3d++++Test3%0d%0a+04+++++S++4848AB%3d%2c4%2b+Test4%0d%0a+05+++++S++899AB%3d%2c%2b+++TSH%2b+%0d%0a+06+++++S++899AB%3d%2c%2b+++TSH%2b++&o=m). I am a bit unsure of the rule here. – Wiktor Stribiżew Sep 14 '18 at 14:02
  • Maybe just [`(?m)(?>^\s*\d+\s+\w\s+\S+)(?<!(?:10231|%SBTEX1)\S*)`](http://regexstorm.net/tester?p=%28%3fm%29%28%3f%3e%5e%5cs*%5cd%2b%5cs%2b%5cw%5cs%2b%5cS%2b%29%28%3f%3c!%28%3f%3a10231%7c%25SBTEX1%29%5cS*%29&i=+01+++++S++10231AB%3d%2c%2b+Test1%0d%0a+02+++++S++%25SBTEX1++++Test2+With+Spaces+++++++20180912+10%3a02+AM+MYR001%0d%0a+03+++++B++6399AB%3d++++Test3%0d%0a+04+++++S++4848AB%3d%2c4%2b+Test4%0d%0a+05+++++S++899AB%3d%2c%2b+++TSH%2b+%0d%0a+06+++++S++899AB%3d%2c%2b+++TSH%2b++&o=m)? – Wiktor Stribiżew Sep 14 '18 at 14:04
  • I added to the second one that you posted. We almost have it but it is counting line 4 instead of line 5. How can we specify to stop at the line end? (?m)(?>^[\t\p{Zs}]*\d+\s+S\s+\S+)(?<!\s\S*(?<!\d)(?:10231|%SBTEX1)(?!\d)\S*)\s+\w+.+\d+\s+\d – Mark Sep 14 '18 at 14:22
  • On a match, the line will end with an ID such as "WDHLSY1" – Mark Sep 14 '18 at 14:26
  • Sorry, I do not get what you mean. To match end of a line, use `\r?$`. Remember to replace `\s` with `[\p{Zs}\t]` if you want to stay on a line while matching. – Wiktor Stribiżew Sep 14 '18 at 14:28

2 Answers2

2

You may use

(?m)^\d+\s+\w\s+\d+(?<!\s(?:10231|91431))\r?$

See the regex demo.

I assume the lines do not start with whitespaces, so I removed the initial \s+ from your pattern and added the ^ as a start of a line anchor (as (?m) modifies the behavior of both ^ and $, thus, making \r? necessary for $ to match at the CRLF line endings.)

Pattern details

  • (?m) - ^ now matches the start of a line and $ matches the end of a line
  • ^ - start of a line
  • \d+ - 1+ digits
  • \s+ - 1+ whitespaces (replace with [\p{Zs}\t]+ to only match horizontal whitespaces ([^\S\r\n]+ might also do))
  • \w - a word char
  • \s+ - 1+ whitespaces
  • \d+ - 1+ digits
  • (?<!\s(?:10231|91431)) - a negative lookbehind that fails the match if, immediately to the left of the current location, there is a whitespace and either of the two numeric values
  • \r?$ - an optional CR and end of a line anchor.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • It's really good but I probably should have included the full text. I'm trying it on the text that I'm really trying to find the answer to and I can't get it to work. I'll study your answer tomorrow and see if I can get it to work. If not, I'll post the entire text and you'll probably be able to get it. I really appreciate your help! – Mark Sep 13 '18 at 20:36
  • @MarkS: The main point is to find the place where you may "anchor" the restricting lookaround and then use the lookaround (if a lookahead, it must be in front, a lookbehind, as you see, is after the more generic pattern). The same can be achieved with [`(?m)^\d+\s+\w\s+(?!(?:10231|91431)\r?$)\d+\r?$`](http://regexstorm.net/tester?p=%28%3fm%29%5e%5cd%2b%5cs%2b%5cw%5cs%2b%28%3f!%28%3f%3a10231%7c91431%29%5cr%3f%24%29%5cd%2b%5cr%3f%24&i=01+++++S++10231%0d%0a02+++++S++91431%0d%0a03+++++S++899&o=m). – Wiktor Stribiżew Sep 13 '18 at 20:38
  • What if I need to keep going with the regular expression? Can I keep adding to it after the look behind? – Mark Sep 13 '18 at 22:46
  • @MarkS Yes, try. Let know if you have issues. – Wiktor Stribiżew Sep 13 '18 at 23:17
  • I updated the question Wiktor. I came into work this morning and have been trying for an hour but this problem is beyond my skills with Regex unfortunately. I believe you'll be able to figure it out. Thank you in advance for your support! – Mark Sep 14 '18 at 13:57
0

The answer for this particular problem is:

(?m)(?>^[\t\p{Zs}]*\d+\s+S\s+\S+)(?<!\s\S*(?<!\d)(?:10231|%SBTEX1)(?!\d)\S*).+\d+[\p{Zs}\t]+\d+

Click the hyperlink below to be taken to this solution on RegexStorm.Net/Tester so you can mess around with the Regex yourself for learning purposes.

Interactive Solution On RegexStorm.Net/Tester

This will match lines 4 and lines 5 which is what I wanted. Originally I had

(?m)(?>^[\t\p{Zs}]*\d+\s+S\s+\S+)(?<!\s\S*(?<!\d)(?:10231|%SBTEX1)(?!\d)\S*).+\d+\s+\d+

Which was only matching line 4. I read Wiktor's comment and he said

"Remember to replace \s with [\p{Zs}\t] if you want to stay on a line while matching."

So I then replaced the \s+ at the end of this Regex with [\p{Zs}\t]+ and got the answer that will work for my particular problem. One more time, it is:

(?m)(?>^[\t\p{Zs}]*\d+\s+S\s+\S+)(?<!\s\S*(?<!\d)(?:10231|%SBTEX1)(?!\d)\S*).+\d+[\p{Zs}\t]+\d+

I would also encourage anyone who needs to exclude any string of text from being a match in a Regex to manipulate this solution to your own needs.

Thank you Wiktor. I couldn't have gotten this solution without your help!

Mark
  • 178
  • 1
  • 2
  • 16