2

I'm new to regex but I seem to have things going my way.

https://regex101.com/r/Is8wZK/1 --- group 8 might have more than one word in it... sepereated by a space, but, as u can see, so does group 5, and i've exhausted my one time useage of (.+)

How can I re-write my regex to detect group 8 in exactly the way group 5 is detected?

David Kachlon
  • 201
  • 3
  • 14
  • Are there any other things that determine the other groups? a similar pattern? I see that group 1 is a date, might think of using `(\d+/\d+/\d+)` instead of `\S` – Isaac May 17 '18 at 22:38
  • There's this... https://snag.gy/HQnxlq.jpg – David Kachlon May 17 '18 at 22:40
  • Tweaked your regex a bit but I think you can better specify each group, either way this seems to work: https://regex101.com/r/UpccF3/2 – Denny Ferrassoli May 17 '18 at 22:54
  • I'm guessing this is meant to be used for logging, where you know with certainty that every group always have at least one character, right? For in the way you have structured the regex, nothing will be detected if one of the columns does not provide data. If group 9 can have space, part of group 9 gets moved over to group 8. – elektronet May 18 '18 at 01:30
  • 1
    Please include your current regex in the question, not in a link – Zoe May 18 '18 at 07:59

2 Answers2

2
^(\S+)\s+(\S+)\s+(\S+)\s+(\S+)\s+((?:[[:alpha:]]+)(?:\s+[[:alpha:]]+)*)\s+(\S+)\s+(\S+)\s+((?:[[:alpha:]]+)(?:\s+[[:alpha:]]+)*)\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)$

Link: https://regex101.com/r/v4mEJK/1

Pretty much all you need to do is match a group of alphabetic character and an optional group of spaces plus alphabetic characters in order to capture names which may or may not have more than one word; this is done by using

(?:[[:alpha:]]+)(?:\s+[[:alpha:]]+)*)

for groups 5 and 8.

The rest of the regex could possibly be made more specific, but there isn't really any need to add more complexity unless your input text is significantly more complex than your test case.

FWIW: It's far better to use \s+ instead of a raw space between groups so you can match other delimiting whitespace.

Joey Pabalinas
  • 126
  • 1
  • 5
1

I redid your generic capture groups into this:

^(\d+\/\d+\/\d+) ([A-Z]\d+) (\d+) (\d+) (.+) (\d+[A-Z]{3}\d+) (\d+) (.+) ([A-Z]) (\d+\.\d+) (\d+\.\d+) (\d+\.\d+)$

Breaking that down:

  • (\d+\/\d+\/\d+): this matches the date
  • ([A-Z]\d+): this matches a capital followed by some numbers
  • (\d+): this matches a number
  • (\d+): this matches a number
  • (.+): this is the first general group
  • (\d+[A-Z]{3}\d+): this matches any number followed by 3 capitals followed by any number
  • (\d+): this matches a number
  • (.+): this is the second general group
  • (\d+\.\d+): this matches a number with a decimal point
  • (\d+\.\d+): this matches a number with a decimal point
  • (\d+\.\d+): this matches a number with a decimal point

This should help you get what you want.


If you are only interested in groups 5 and 8, try non capturing groups:

^(?:\d+\/\d+\/\d+) (?:[A-Z]\d+) (?:\d+) (?:\d+) (.+) (?:\d+[A-Z]{3}\d+) (?:\d+) (.+) (?:[A-Z]) (?:\d+\.\d+) (?:\d+\.\d+) (?:\d+\.\d+)$

Or only group what you need:

^\d+\/\d+\/\d+ [A-Z]\d+ \d+ \d+ (.+) \d+[A-Z]{3}\d+ \d+ (.+) [A-Z] \d+\.\d+ \d+\.\d+ \d+\.\d+$
Isaac
  • 11,409
  • 5
  • 33
  • 45
  • I sincerely appreciate your contribution. Please excuse me for not including more data.... This is very close to what I need... Group 5 is not always a number followed by 3 capitals, in fact, it almost has no organization to it whatsoever.. here are some more examples of values that they could be, PT301WHT06, 20007TWBLK.. etc – David Kachlon May 17 '18 at 23:00
  • @DavidKachlon that's ok, I think that you should be able to work on that specific example and get it going yourself. Hint: put characters in `[]`. See: https://stackoverflow.com/questions/9801630/what-is-the-difference-between-square-brackets-and-parentheses-in-a-regex – Isaac May 17 '18 at 23:03