0

I have been working on the following regex expression:

/(?<=\@Comment\{Annotation: key:START;\})( )/

which is designed to try and find an annotation that looks like: @Comment{Annotation: key:START;} in a text file. These annotations represent the possible lines where the file could be broken down into smaller files.

I am having problems completing my capture group instruction or if I have described that wrong, my last ( ) so that it scans all lines remaining in the string (which might contain EOF) or the next annotation fitting this pattern is detected.

I am hoping not to have to convert this to a line based approach with checks performed on each line...

I thought one of the following might have worked but so far nothing has:

  • \s
  • \Z
  • \s*(.*) --> this works in the sense that I can manually repeat this sequence to add each line, one at a time, but that's highly impractical
EngBIRD
  • 1,915
  • 3
  • 18
  • 22
  • You don't need to add a capture group to consume the rest of the file. You should be able to just find all matches on `/(?<=\@Comment\{Annotation: key:START;\})/`. What programming language are you using? It probably has something like `regex.findAll` available. – Blorgbeard Jun 30 '16 at 22:03
  • Thanks @Blorgbeard, I have tried that with no success, I don't know how long this link would remain active, but it should be good for a few days: This might demonstrate better what I am trying to acopmplish: https://regex101.com/r/nF7bM6/2 – EngBIRD Jun 30 '16 at 22:25
  • Add the `g` (global) modifer: https://regex101.com/r/cF0cD2/1 – Blorgbeard Jun 30 '16 at 22:32
  • @Blorgbeard I think I must be missing something. I am testing this out with a quick java program (regex `Pattern` and `Matcher` objects). Not sure where to start debuging though because (on my browser) your link doesn't return anything, all I see is: `No match groups were extracted` but I assumed this was because of: `g modifier: global. All matches (don't return on first match)` – EngBIRD Jun 30 '16 at 22:43
  • It doesn't look like it returns anything because of your lookbehind. Changing it to a normal group: https://regex101.com/r/cF0cD2/2 – Blorgbeard Jun 30 '16 at 22:44
  • Are you just trying to search for the annotations? Or are you trying to get all the text after the annotations? – swlim Jun 30 '16 at 22:45
  • I don't really speak Java these days, but it sounds like your real question needs to be "How do I find all matches of a regex in Java?" – Blorgbeard Jun 30 '16 at 22:45
  • @Blorgbeard - Correction: I don't want the matches, I want all the text that follows it. – EngBIRD Jun 30 '16 at 22:47
  • @SWLim I don't want the match on the annotations, I am trying to get the text after it. If my string was a constant, I could just use function in java like: `String[] distinctFiles = content.split("@Comment{Annotation: key:START;}");`, but I am hoping to have a bit more flexibility and learn something by using regex. – EngBIRD Jun 30 '16 at 22:50
  • Do you want the separator text to be included in either side of the split text? – Blorgbeard Jun 30 '16 at 22:52
  • @Blorgbeard For my actual use it's complicated, and I will explain, but `I would be happy to have it removed and not appear on either side` for the purposes of a clean answer to this question. I expect when using a more constant version of the string that I will remove it, but when it is pattern based and contains information like the file name or path, I will keep it. For example, on the same block of text, I can run: `(?<=\\@Comment\\{OriginalFile: path:)(.*?)(?=;\\})` which I have found to work. – EngBIRD Jun 30 '16 at 22:55

1 Answers1

4

This regex should work:

(.*?)((\@Comment\{Annotation: key:START;\})|$)

See example online.

The (.*?) matches the text up until your separator expression. Then follows an expression which matches either your separator, or the end of the document ($).

For each match, the first group gives you the text before the separator, and the second group is the matched separator text.

This expression needs single-line mode s and global mode g.

Blorgbeard
  • 101,031
  • 48
  • 228
  • 272
  • 1
    For the first capture group, shouldn't it be `(.*?)` instead of `(.+?)` ? Otherwise the first annotation will not be captured. Unless I'm missing a requirement? – swlim Jun 30 '16 at 23:07
  • @SWLim you're right, I missed the first separator. Changing it to `(.*?)` gives you a first match with no text and just the separator. See new example link. – Blorgbeard Jun 30 '16 at 23:13
  • I think Java doesn't accept the modifiers outside of the `/ .... /` so is there a way for them to be included as part of the expression itself? A quick search showed be that some modifiers can be placed at the beginning i.e. `(?u)`, but this doesn't work either. – EngBIRD Jun 30 '16 at 23:16
  • Looks like you can use `Pattern.DOTALL` and then repeatedly call `find`. See http://stackoverflow.com/questions/3651725/match-multiline-text-using-regular-expression and https://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html#find() – Blorgbeard Jun 30 '16 at 23:21
  • Thanks, I had seen that question, and had allready added the mulitline part, but I didn't interpret the DOTALL part correctly, and didn't think it was appropriate for me to include. Testing a little more, but I think `Pattern.MULTILINE | Pattern.DOTALL` may have done it. – EngBIRD Jun 30 '16 at 23:26
  • I don't think you want MULTILINE - you don't want `$` to match end of line. I'd try with just DOTALL. – Blorgbeard Jun 30 '16 at 23:27