2

Given:

-- input --

Keep this.
And keep this.

And keep this too.
Chomp this chomp:
Anything beyond here gets chomped.

-- output (expected) --

Keep this.
And keep this.

And keep this too.

How can I match a regex per a grouping so that once "chomp:" is found, everything from the beginning of that line as well as after gets chomped (deleted)?

String text = "Keep this.\nAnd keep this.\n\nAnd keep this too.\n"
        + "This could be anything here chomp:\nAnything beyond here gets chomped.";
Pattern CHOMP= Pattern.compile("^((.*)chomp:(.*))$",  Pattern.MULTILINE | Pattern.DOTALL);
Matcher m = CHOMP.matcher(text);
if (m.find()) {
    int count = m.groupCount();
    //         
    // How can I match a group here to either delete or keep for expected output?
    //
    // text = <match a group to assign or replace non-desired text>;
    System.out.println(text);  // Should output contents from above -- output (expected) --
}
genxgeek
  • 13,109
  • 38
  • 135
  • 217

4 Answers4

1

Here is one approach and a demonstration on ideaone.

I've simplified the pattern slightly; howerver, the biggest change in my code is that it runs without the DOTALL option - with DOTALL the . will incorrectly match across multiple lines.

^(.*)chomp:(.*)

The pattern should match once (as seems to be the intent), fill in the groups 1 and 2 with the text before/after "chomp:" and the remainder of the data will be "consumed" because it is simply not processed. To get the data before the regular expression match (and not the match), I use the following construct:

StringBuffer sb = new StringBuffer();
matcher.appendReplacement(sb, "");

(While this could be replaced with a substring, I suppose, this idiom mirrors other patterns.)


If you wish to do line-oriented processing (which would be suitable for large streams), then the correct approach is to process each line in turn. I would probably use either a split or a Scanner approach myself, but I wished to keep this answer within the original whole-regex approach originally presented.

For instance:

Scanner s = new Scanner(input);
while (s.hasNextLine()) {
    // process next line and "break" if it matches the end-line condition
}

Snippet from ideone:

String text = "Keep this.\nAnd keep this.\n\nAnd keep this too.\n"
        + "Chomp this chomp:\nAnything beyond here gets chomped.";
Pattern CHOMP= Pattern.compile("^(.*)chomp:(.*)",  Pattern.MULTILINE);
Matcher m = CHOMP.matcher(text);
if (m.find()) {
    System.out.println("  LINE:" + m.group(0));
    System.out.println("BEFORE:" + m.group(1));
    System.out.println(" AFTER:" + m.group(2));
    System.out.println(">>>");
    StringBuffer sb = new StringBuffer();
    m.appendReplacement(sb, "");
    System.out.print(sb);
    System.out.println("<<<");
}
Community
  • 1
  • 1
user2864740
  • 60,010
  • 15
  • 145
  • 220
0

I used this approach which achieves the expected output:

    public static void main(String[] args) {
        String text = "Keep this.\nAnd keep this.\n\nAnd keep this too.\n"
                + "Chomp this chomp:\nAnything beyond here gets chomped.";
        Pattern CHOMP= Pattern.compile("[c|C]homp");
        Matcher m = CHOMP.matcher(text);
        if (m.find()) {
            String s = text.substring(0, m.start());

            System.out.println(s);  
        }      
    }

[c|C] checks for upper or lower case "C", you use both in this example. when the first instance of chomp/Chomp is found, I call the substring method which will remove everthing after the first match.

I know you mentioned using groups, is there a specific reason for this or does this solution suffice?

Sionnach733
  • 4,686
  • 4
  • 36
  • 51
  • ok so now for a bit more specifics (sorry for lack thereof), which is why I wanted to match to beginning of line regardless of start of line contents...so the following input will not work. – genxgeek Dec 05 '13 at 16:55
  • String text = "Keep this.\nAnd keep this.\n\nAnd keep this too.\n" + "This is anything up to here chomp:\nAnything beyond here gets chomped."; – genxgeek Dec 05 '13 at 16:56
  • do you mean that everything in a line that contains "chomp" gets dropped? – Sionnach733 Dec 05 '13 at 17:26
  • but I need to also match another email pattern before chomp: such as "\nThis is anything up to here john@my.com chomp:" – genxgeek Dec 05 '13 at 17:36
0

One way could be :

  1. Split the string on basis of .(dot) operator

  2. Iterate through lines. Break out of loop as soon as you find chomp else print the lines.

Code fragment that accompalish this :

String text = "Keep this.\nAnd keep this.\n\nAnd keep this too.\n"
            + "Chomp this chomp:\nAnything beyond here gets chomped.";
String[] split = text.split("\\.");
            for(int i=0;i<split.length;i++) {
                if(split[i].contains("Chomp") || split[i].contains("chomp"))
                    break;
                System.out.println(split[i]);
            }

Output :

Keep this

And keep this


And keep this too

"\nChomp this chomp:\nAnything beyond here gets chomped." is not in the output.

Nishant Lakhara
  • 2,295
  • 4
  • 23
  • 46
0
String newText = text.replaceAll("(?m)^.*chomp(?s).*", "");

The inline modifier (?m) turns on MULTILINE mode so the ^ can match the beginning of a line. But DOTALL mode is still off, so if it doesn't find chomp in that same line, it gives up and tries again at the beginning of the next line. When it does find a line with chomp in it, (?s) turns on DOTALL mode so the second .* can consume the rest of the text, newlines and all.

I don't know what you're trying to do with groupCount(). If your goal is just to get rid of the chomp line and everything after it, you don't need to use capturing groups. Anyway, that method only tells how many capturing groups there are in the regex. It's a static property of the Pattern object associated with the Matcher; it doesn't tell you anything about what was actually matched.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156