2

I am aware of the trim() function for String and i am trying to implement it in my own to better understand regex. The following code does not seem to work in Java. any input ?

private static String secondWay(String input) {
  Pattern pattern = Pattern.compile("^\\s+(.*)(\\s$)+");
  Matcher matcher = pattern.matcher(input);
  String output = null;
  while(matcher.find()) {
    output = matcher.group(1);
    System.out.println("'"+output+"'");
}
return output;
}

The output for

input = "    This is a test    " is 'This is a test   '

I am able to do it using an alternative way like

private static final String start_spaces = "^(\\s)+";
private static final String end_spaces = "(\\s)+$";
private static String oneWay(String input) {
       String output;
       input = input.replaceAll(start_spaces,"");
       output = input.replaceAll(end_spaces,"");
       System.out.println("'"+output+"'");
       return output;
}

The output is accurate as

'This is a test'

I want to modify my first method to run correctly and return the result.

Any help is appreciated. Thank you :)

Adil F
  • 447
  • 1
  • 5
  • 14

2 Answers2

3

Your pattern is incorrect, it matches the beginning whitespace, your input (greedy) matching until the last whitespace and then it captures the last whitespace at the end of the string.

You want the following instead, following .* with ? as well for a non-greedy match.

Pattern pattern = Pattern.compile("^\\s+(.*?)\\s+$");

Regular expression:

^              # the beginning of the string
\s+            # whitespace (\n, \r, \t, \f, and " ") (1 or more times)
(              # group and capture to \1:
 .*?           # any character except \n (0 or more times)
)              # end of \1
\s+            # whitespace (\n, \r, \t, \f, and " ") (1 or more times)
$              # before an optional \n, and the end of the string

See Demo

EDIT: If you want to capture the leading and trailing whitespace into groups, just place a capturing group () around them as well.

Pattern pattern = Pattern.compile("^(\\s+)(.*?)(\\s+)$");
  • Group 1 contains leading whitespace
  • Group 2 contains your matched text
  • Group 3 contains trailing whitespace

FYI, for replacing the leading/trailing whitespace you can achieve this in one line.

input.replaceAll("^\\s+|\\s+$", "");
hwnd
  • 69,796
  • 4
  • 95
  • 132
  • I will accept your answer, but can you tell me how can i add the leading and trailing spaces into groups ? – Adil F Jun 01 '14 at 00:13
  • Thank you @hwnd. ? was the missing link. :) – Adil F Jun 01 '14 at 00:19
  • @AdilF following `.*` with `?` allows it to match the least amount possible, where `.*` matches the most amount possible. – hwnd Jun 01 '14 at 00:29
  • I dont get it. With .* i got 'This is a test '( with spaces. comments is removing the trailing spaces) while with .*? i get 'This is a test'. i am under the impression ? is making the group more greedy. – Adil F Jun 01 '14 at 00:34
  • 1
    @AdilF Take a look [here](http://stackoverflow.com/questions/2301285/what-do-lazy-and-greedy-mean-in-the-context-of-regular-expressions) – hwnd Jun 01 '14 at 00:35
3

I realize you are using a Pattern and Matcher, but this is the easiest way to do it:

private static String secondWay(String input) {
    String pattern = "^\\s+|\\s+$"; // notice it's a string
    return input.replaceAll(pattern, ""); 
}

The regex is ^\\s+|\\s+$ which matches:

  • all starting whitespace (^ means start and \\s+ means whitespace)
  • or (| means or)
  • all ending whitespace ($ means end of line)
Michael Yaworski
  • 13,410
  • 19
  • 69
  • 97