Capture all text per group

Question

I have some data that looks like this:

DEC 12, 2020
incoming 192.168.0.5 10:30
outgoing 192.168.0.5 13:23
DEC 13, 2020
incoming 192.168.0.6 09:34
outgoing 192.168.0.6 14:12

I am trying to get the date and all data for that date into one grouping like so:

First match
Group 1 - DEC 12, 2020
Group 2 - incoming 192.168.0.5 10:30
          outgoing 192.168.0.5 13:23

Second match
Group 1 - DEC 13, 2020
Group 2 - incoming 192.168.0.6 09:34
          outgoing 192.168.0.6 14:12

I have tried this regex:

^([A-Z] \d+, \d{4})(.*)

The problem is, this reads all the way to the end instead of stopping at the next match (DEC 13, 2020) like so:

Group 1 - DEC 12, 2020
Group 2 - incoming 192.168.0.5 10:30
          outgoing 192.168.0.5 13:23
          DEC 13, 2020
          incoming 192.168.0.6 09:34
          outgoing 192.168.0.6 14:12

If I add the ? like so:

^([A-Z] \d+, \d{4})(.*?)

The I get only the dates.

First Match
Group 1 - DEC 12, 2020
Group 2 - white space

Second Match
Group 1 - DEC 13, 2020
Group 2 - white space

Can someone please tell me what I am missing? How can I get it to stop at the next match and not the end of the line or end of the text? All lines have a CRLF at the end. Thanks.

Why don't you simply split by newlines in whatever language you use. Every *not MOD 3* will be your 1st group, the rest is your second group. — Roko C. Buljan, Jan 07 '21 at 16:28
`^([A-Z]{3} \d+, \d{4})((?:\n(?![A-Z]{3} \d).*)*)`? See https://regex101.com/r/mZy8no/1 — Wiktor Stribiżew, Jan 07 '21 at 16:29
Sorry. I should have mentioned that most days will have a different amount of entries. I simplified it above and used 2 lines for both. I should not have done that. — J_K_M_A_N, Jan 07 '21 at 16:45
Wiktor, I use this site to test my regex: http://regexstorm.net/tester Since it seems to line up with VB Net for me 99% of the time. Your regex gave me the same white space and all the dates. :( — J_K_M_A_N, Jan 07 '21 at 17:02
Use `(?m)^([A-Z]{3} \d+, \d{4})((?:\r?\n(?![A-Z]{3} \d).*)*)`, see [demo](http://regexstorm.net/tester?p=%28%3fm%29%5e%28%5bA-Z%5d%7b3%7d+%5cd%2b%2c+%5cd%7b4%7d%29%28%28%3f%3a%5cr%3f%5cn%28%3f!%5bA-Z%5d%7b3%7d+%5cd%29.*%29*%29&i=DEC+12%2c+2020%0d%0aincoming+192.168.0.5+10%3a30%0d%0aoutgoing+192.168.0.5+13%3a23%0d%0aDEC+13%2c+2020%0d%0aincoming+192.168.0.6+09%3a34%0d%0aoutgoing+192.168.0.6+14%3a12). — Wiktor Stribiżew, Jan 07 '21 at 17:03
Nailed it! Thanks Wiktor. Do you want to do a format answer so I can accept it? Thank you for the help! — J_K_M_A_N, Jan 07 '21 at 17:07
Roko, I definitely could have done that (I use VB.net) but I really like regex and I want to expand my knowledge on that. That is why I wanted to learn this way. Thank you for the suggestion though. — J_K_M_A_N, Jan 07 '21 at 17:34

score 1 · Accepted Answer · answered Jan 07 '21 at 17:19

1

You can use

(?m)^([A-Z]{3} \d+, \d{4})((?:\r?\n(?![A-Z]{3} \d).*)*)

See the regex demo. Details:

(?m) - a RegexOptions.Multiline inline option
^ - start of a line
([A-Z]{3} \d+, \d{4}) - Group 1: three uppercase ASCII letters, space, one or more digits, a comma, a space and then four digits
((?:\r?\n(?![A-Z]{3} \d).*)*) - Group 2: zero or more occurrences of
- \r?\n - a CRLF or LF only line break sequence...
- (?![A-Z]{3} \d) - that is not immediately followed with three uppercase ASCII letters, space, digit
- .* - the rest of the line.

Output:

answered Jan 07 '21 at 17:19

Wiktor Stribiżew

607,720
39
448
563

1

Thank you very much. I did not know about nesting the \r?\n like that. That is why I like to post on this site. So I can learn something new and have reference to it. :) Thanks again! – J_K_M_A_N Jan 07 '21 at 17:26
1

@J_K_M_A_N `(?:...)` is a [non-capturing group](https://stackoverflow.com/questions/3512471/what-is-a-non-capturing-group-in-regular-expressions) that is used to *group* sequences of patterns, so that they could be quantified together as a single group. The capturing parentheses around this `(?:...)*` have a side-effect of keeping the initial line break in the Group 2 value, so you should `.Trim()` the value or regroup the patterns repeating them: `(?m)^([A-Z]{3} \d+, \d{4})\r?\n((?![A-Z]{3} \d).*(?:\r?\n(?![A-Z]{3} \d).*)*)` – Wiktor Stribiżew Jan 07 '21 at 17:35
Thanks. I usually use a .Replace(vbCr,"") when reading these groups. (Or vbCrLf) Roko is right that it would be much easier to read line by line, but again, I like regex. :) – J_K_M_A_N Jan 07 '21 at 17:41

Capture all text per group

1 Answers1