4

I'm trying to parse following Cobol code in Java.

    PNAME.                                                                  P000
    084500     MOVE        src1 TO dest1                                    P110
    084510     MOVE        src2 TO dest2                                    P111
    084520     MOVE        src3 TO dest3                                    P115
    084530     MOVE        src4 TO dest4                                    P120
    084540     MOVE        src5 TO dest5.                                   P140
    084550     PERFORM     TOTO THRU TOTO-FN.                               P310

My target is to found the MOVE statement corresponding to a given name.
Ex : with dest5 I want to found "MOVE src5 TO dest5."

My Java code is :

    String paragraphePart = "PNAME.                                                                  P000
084500     MOVE        src1 TO dest1                                    P110
084510     MOVE        src2 TO dest2                                    P111
084520     MOVE        src3 TO dest3                                    P115
084530     MOVE        src4 TO dest4                                    P120
084540     MOVE        src5 TO dest5.                                   P140
084550     PERFORM     TOTO THRU TOTO-FN.                               P310";
    Matcher m = Pattern.compile("MOVE((?!.*MOVE.*).)*TO\\s+[^\\.]*"+"dest5"+"(\\s+|\\.|$)",Pattern.MULTILINE).matcher(paragraphePart);
    while(m.find()){
    //treatement on m.group(0)
    }

m.group(0) contains :

    MOVE        src1 TO dest1                                    P110
    084510     MOVE        src2 TO dest2                                    P111
    084520     MOVE        src3 TO dest3                                    P115
    084530     MOVE        src4 TO dest4                                    P120
    084540     MOVE        src5 TO dest5.

But I only want to get this line : "MOVE src5 TO dest5." In my regex I've to use something like MOVE.*TO because I can have this case :

    084540     MOVE                        P120
    084550     src5 TO dest5.

Here I have to get MOVE P120 084550 src5 TO dest5 and not just src5 TO dest5.

So how could i tell to my regex find MOVE followed by anything - but not another "MOVE" - and followed by "TO" ?

Thanks

[SOLVED]
I use :

    Matcher m = Pattern.compile("(MOVE(?!.*?MOVE).*?\\s+TO\\s+[^\\.]*"+fieldName+"(\\s+|\\.|$))", Pattern.DOTALL).matcher(paragraphePart);

Thank you anubhava!
https://stackoverflow.com/a/8803309/1140748

[NEW PB] Using
Matcher m = Pattern.compile("(MOVE(?!.*?MOVE).*?\\s+TO\\s+[^\\.]*"+"dest5"+"(\\s+|\\.|$))", Pattern.DOTALL).matcher(paragraphePart);
I can get MOVE src5 TO dest5. But if I try using "dest4" to get this line "MOVE src4 TO dest4" it doesn't work anymore. Have you an idea?

Matcher m = Pattern.compile("(MOVE(?!.*?MOVE.*?"+fieldName+").*?\\s+\\w+\\s+TO\\s+[^\\.]*"+fieldName+"(\\s+|\\.|$))", Pattern.DOTALL).matcher(paragraphePart);


Community
  • 1
  • 1
alain.janinm
  • 19,951
  • 10
  • 65
  • 112

2 Answers2

1

You can use following negative lookahead based regex:

String needle = "dest5";
Matcher m = Pattern.compile("(MOVE(?!.*?MOVE.*?" + needle + ").*?\\s+.+?\\s+TO\\s+" + needle + ")", Pattern.DOTALL).matcher(paragraphePart);
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • This don't give the true answer because for the last exemple i gave we have 2 lines so we'll get "src5 TO dest5", I have to get "MOVE P120 084550 src5 TO dest5" The other problem is that I need the reference to MOVE statement in the regex, i don't want every sentences with "src5" within – alain.janinm Jan 10 '12 at 12:39
  • Pls check my updated answer, it works with both of your examples. – anubhava Jan 10 '12 at 13:15
  • Thanks a lot `(?!.*?MOVE).*?` make it good :) I've made some research but I'm not sure to understand how it works... What is the meaning of `?` after `.*` – alain.janinm Jan 10 '12 at 17:22
  • 1
    `.*?` is for **non-greedy** match of a text of 0 or more length. Without `?` above regex will become greedy and will not stop before MOVE. – anubhava Jan 10 '12 at 17:43
  • Ok now I understand what those "Greedy quantifiers" and "Reluctant quantifiers" in the Java doc. Thanks again! – alain.janinm Jan 10 '12 at 17:52
  • Using `Matcher m = Pattern.compile("(MOVE(?!.*?MOVE).*?\\s+TO\\s+[^\\.]*"+"dest5"+"(\\s+|\\.|$))", Pattern.DOTALL).matcher(paragraphePart);` I can get MOVE src5 TO dest5. But if I try using "dest4" to get this line "MOVE src4 TO dest4" it doesn't work anymore. Have you an idea? – alain.janinm Jan 11 '12 at 09:45
  • Updated my answer, pls check. – anubhava Jan 11 '12 at 12:36
  • Target is to find the line based on DEST4 not SRC4, I've try to modify your regex but it still don't work... Can you give me a regex that use dest4 instead of src4? Thanks – alain.janinm Jan 11 '12 at 15:36
  • Updated my answer again, pls check. – anubhava Jan 11 '12 at 15:54
  • It works for me for search needles: dest5, dest4, dest3 etc. Pls tell me for what cases it doesn't work. – anubhava Jan 12 '12 at 05:04
  • 1
    Updated my answer again, pls check. We were using `\\w+` which would not match single quote `'` in your new text, now we're using `.+?` to match everything in non-greedy manner. – anubhava Jan 12 '12 at 10:18
  • Thanks a lot this one works! By the way I notice that using `.+` decrease performance dramatically. I'll deal with this in my Java code! – alain.janinm Jan 12 '12 at 14:09
  • Instead of `.+?` you can try `[^\\s]+` that might improve the performance. – anubhava Jan 12 '12 at 14:18
0

There's no easy way to negate an entire word, you can only negate letter by letter.

It seems to me that the easiest way to do this is use regular Java code rather than a regex.

Ilya Kogan
  • 21,995
  • 15
  • 85
  • 141
  • Thanks for the fast answer! I think it's possible I've found some thread speaking about this like http://stackoverflow.com/questions/1240275/how-to-negate-specific-word-in-regex I really need to use regex cause that's the easiest way to find all MOVE statement involving a given field in a full cobol program – alain.janinm Jan 10 '12 at 12:30