I have a string like this:
Received @ 10/10/2014 02:29:55 a.m. Changed status: 'processing' @ 10/10/2014 02:40:20 a.m. Changed status: 'processed' @ 10/10/2014 02:40:24 a.m.
I need to "parse" this string using certain rules:
- The first block is the
Received
date and time - Each block after the first one starts with
Changed status:
and ends with a date and timeThere can be any number of
Changed status:
blocks (at least 1) and the status can vary
What I need to do is to:
- Split the string and put each block into an array.
Example:
[Received @ 10/10/2014 02:29:55 a.m.], [Changed status: 'processing' @ 10/10/2014 02:40:20 a.m.], [Changed status: 'processed' @ 10/10/2014 02:40:24 a.m.]
- After each block is split, I need to split each entry in three fields
For the above example, what I need is something like this:
Received | NULL | 10/10/2014 02:29:55 am
Changed status | processing | 10/10/2014 02:40:20 am
Changed status | processed | 10/10/2014 02:40:20 am
I think step two is quite easy (each block can be split using @
and :
as separators), but step one is making me pull my hair off. Is there a way to do this kind of thing with Regular Expressions?
I've tried some approaches (like Received|Changed.*[ap].m.
), but it doesn't work (the evaluation of the regular expression always returns the full string).
I want to do this in R:
- Read the full data table (which has more fields, and the text above is the last one) into a data frame
- "Parse" this string and store it into a second data frame
R has built-in support for regular expressions, so that's my fist thought on approaching the solution.
Any help will be appreciated. Honestly, I'm lost here (but I'll keep on trying... I'll edit my post if I find steps that bring me closer to the solution)