I am attempting to parse a flavour of markdown that has some keywords in quotes or angular brackets.
Words between "
are static keywords, and the ones between <
and >
are dynamic.
Sample:
* Say "hello" to "world"
* Say <something> to <somebody>
* I can also be a plain statement
The logic goes like this:
- find all lines that are defined with a starting
*
- Check if the line has keyword
- Extract keywords if any.
I have a simple regex (\W+(\*.+)
) that helps me extract the line, but am not sure how to extend it to extract the values between quotes or angular brackets.
UPDATE 1
So, after hint from @EvanKnowles' link, I came up with this regex which seems to work, but I'll be happy to get any improvements on this.
[ ]*\*([\w ]*(["\<][\w ]+["\>])*)*
UPDATE 2 A few people have suggested doing this in steps i.e. get all valid lines in first pass, and then look up keywords in each line. I'd like to keep this as my last option, the context is that the consumer of this code needs to know the keywords and it's position in the entire string. So maintaining offset is an overhead that I will be inviting on splitting the parent string.