I'm trying to create a large regex expression where the plan is to capture 6 groups. Is gonna be used to parse some Android log that have the following format:
2020-03-10T14:09:13.3250000 VERB CallingClass 17503 20870 Whatever content: this log line had (etc)
The expression I've created so far is the following:
(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{7})\t([A-Za-z]{4})\t(\w{+})\t(\d{5})\t(\d{5})\t(.*$)
The lines in this case are Tab separated, although the application that I'm developing will be dynamic to the point where this is not always the case, so regex I feel is still the best option even if heavier then performing a split.
Breaking down the groups in more detail from my though process:
Matches the date (I'm considering changing this to a x number of characters instead)
(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{7})
Match a block of 4 characters
([A-Za-z]{4})
Match any number of characters until the next tab
(\w{+})
Match a block of 5 numbers 2 times
\t(\d{5})
At last, match everything else until the end of the line. \t(.*$)
If I use a reduced expression to the following it works:
(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{7})\t([A-Za-z]{4})\t(.*$)
This doesn't include 3 of the groups, the word and the 2 numbers blocks.
Any idea why is this?
Thank you.