1

I am currently trying to split up lines of data using excel and VBA with regex to match patterns. All of my data is in one column with each row needing to be split into 3 parts.

My question pertains to the use of regex and how exactly terms work next to each other and what causes the next term to be used.

For instance, I have a line that reads:

"([A-Z]{3})(\W{5,})(.+)(\|\d\.\d)"

Should I read this as "Any 3 capitalized letters followed by at least 5 non-word characters, then take everything up to and including bar digit dot bar(no further)"? Or is the .+ going to just sprawl till the end of my data until it hits a line break?

I guess what I am wondering is if a new term will interupt a previous term (such as with .+ till "|digit.digit" above).

Any assistance in clearing this up for me would be super appreciated, thank you in advance.

Edit: Example

ABC|^-\%!lkaddghlk shfdahah|$^~436346dghdhg|^dgf^356||P|7.7XYZ~^!HYU52

Would this capture only

ABC|^-\%!lkaddghlk shfdahah|$^~436346dghdhg|^dgf^356||P|7.7

Because the last term is |digit.digit or would it capture everything because of .+ in the 3rd capture group?

Edit:

Thanks everyone in the comments, your feedback really helped me out!

Scott T
  • 233
  • 2
  • 5
  • 14
  • 1
    The best post ever about Regex in Excel and VBA: http://stackoverflow.com/questions/22542834/how-to-use-regular-expressions-regex-in-microsoft-excel-both-in-cell-and-loops/22542835#22542835 – mielk Jul 28 '15 at 13:37
  • That is a neat little tool, thanks for sharing it. So I am still a little shaky on understanding if the 4th capture group will interrupt the 3rd. I looks like it would according to regex101, is this true? – Scott T Jul 28 '15 at 13:43
  • What is the exact problem you are trying to solve? Post the code, sample input and expected output. Where is the regex101 fiddle for us to check what you mean? – Wiktor Stribiżew Jul 28 '15 at 13:46
  • Sorry, my above comment to was to a comment that is no longer visible but here is the regex101 https://regex101.com/r/uD4uJ0/1 and the original post has been updated. – Scott T Jul 28 '15 at 13:52
  • I don't really get it, but yes *capture only...Because the last term is |digit.digit*. `.+` will [greedily](http://www.regular-expressions.info/repeat.html#greedy) eat up as much as possible until meeting the `(\|\d\.\d)` part of your pattern (which of course is required for a successfull match). – Jonny 5 Jul 28 '15 at 13:59
  • Cool, thanks for the help! – Scott T Jul 28 '15 at 14:02

1 Answers1

0

Thanks for the help commenters, I now understand that new terms will interrupt older ones in the above code!

Scott T
  • 233
  • 2
  • 5
  • 14