2

I have to process a string with the following rules:

  • It may or may not start with a series of '.
  • It may or may not end with a series of '.
  • Whatever is enclosed between the above should be extracted. However, the enclosed string also may or may not contain a series of '.

For example, I can get following strings as input:

  • ''''aa''''
  • ''''aa
  • aa''''
  • ''''aa''bb''cc''''

For the above examples, I would like to extract the following from them (respectively):

  • aa
  • aa
  • aa
  • aa''bb''cc

I tried the following code in Java:

Pattern p = Pattern.compile("[^']+(.+'*.+)[^']*");
Matcher m = p.matcher("''''aa''bb''cc''''");
while (m.find()) {
    int count = m.groupCount();
    System.out.println("count = " + count);
    for (int i = 0; i <= count; i++) {
        System.out.println("-> " + m.group(i));
    }

But I get the following output:

count = 1
-> aa''bb''cc''''
-> ''bb''cc''''

Any pointers?

EDIT: Never mind, I was using a * at the end of my regex, instead of +. Doing this change gives me the desired output. But I would still welcome any improvements for the regex.

Saadat
  • 47
  • 2
  • 6
  • Take a look at this question, I think it's a good start: http://stackoverflow.com/questions/2088037/trim-characters-in-java –  May 31 '12 at 08:02
  • Thanks. It did cross my mind to use trim, but I dismissed the thought. I guess that would be better than using regex, no? – Saadat May 31 '12 at 08:05
  • Well trim wouldn't actually do it. The person that asked that question I'm sure asked it that way because the operation they're after seems logically similar to what trim does. But look at the accepted answer there. The suggestion was to use a method called "strip" which does what you're trying to do. I'd just go with that unless you're doing this for educational purposes. –  May 31 '12 at 08:08
  • Actually, I now realize that what I am looking for is nothing more than "trim", i.e. trim the `'` from beginning and end of the string. (I am simply amazed at how stupid I have been here!) – Saadat May 31 '12 at 08:15
  • Trim won't do that in Java. It only works with spaces. `strip` from Apache Commons however, does have that functionality. –  May 31 '12 at 08:18
  • Yes, I was also referring to `StringUtils.stripStart(String, String)` and `StringUtils.stripEnd(String, String)` in Apache Commons. – Saadat May 31 '12 at 08:19
  • Ah yes. Well then, you're good to go. –  May 31 '12 at 08:20

3 Answers3

0

have a look at the boundary matcher of Java's Pattern class (http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html). Especially $ (=end of a line) might be interesting. I also recommend the following eclipse plugin for regex testing: http://sourceforge.net/projects/quickrex/ it gives you the possibilty to exactly see what will be the match and the group of your regex for a given test string.

E.g. try the following pattern: [^']+(.+'*.+)+[^'$]

Korgen
  • 5,191
  • 1
  • 29
  • 43
0

This one works for me.

        String str = "''''aa''bb''cc''''";
        Pattern p = Pattern.compile("^'*(.*?)'*$");
        Matcher m = p.matcher(str);
        if (m.find()) {
            System.out.println(m.group(1));
        }
AlexR
  • 114,158
  • 16
  • 130
  • 208
  • Thanks, your regex is much better than mine. However, I am now using `StringUtils.stripStart(String, String)` and `StringUtils.stripEnd(String, String)` from the Apache Commons to achieve the same thing. – Saadat May 31 '12 at 09:32
0

I'm not that good in Java, so I hope the regex is sufficient. For your examples, it works well.

s/^'*(.+?)'*$/$1/gm
primfaktor
  • 2,831
  • 25
  • 34