3

My program reads some strings from file which have to be post processed. The original text in the file looks like

A1DY^
BLKSS^
"GH67^"^

Where ^ is the space character I used to demonstrate. As you can see all words in the file end with space. Some words are in double quote. I want to store these strings in my program

A1DY
BLKSS
GH67

In other words, I want to trim all spaces and double quotes. If I use str.trim(); it will remove the last space. So, the third line will be "GH67^". I also used str.replaceAll("^\"|\"$", ""); to trim the double quotes. The result is GH67^. That means, I have to trim it again.

Is there any better way to remove all spaces and double quotes at once? Note that I don't want to extract alphanumeric characters. I want to trim special characters.

mahmood
  • 23,197
  • 49
  • 147
  • 242

2 Answers2

4

This will trim any number of quotes or spaces from the beginning or end of your string:

str = str.replaceAll("^[ \"]+|[ \"]+$", "");
shmosel
  • 49,289
  • 6
  • 73
  • 138
  • 2
    To prevent common newbie mistake, add `str =` in front. – Andreas May 16 '17 at 20:30
  • Technically, removing leading spaces is counter to OPs requirements, but it likely makes no difference. – Andreas May 16 '17 at 20:35
  • @Andreas Where are you seeing that? – shmosel May 16 '17 at 20:36
  • OP said *"As you can see all words in the file **end with** space"*. Sure, OP also tried to use `trim()`, which trims from both ends, but doesn't mean that using `trim()` would have been right, but since OP didn't think `trim()` would be wrong, there will probably never be leading spaces that needs to be retained, which is why I said *"it likely makes no difference"*, and I left my up-vote in place. – Andreas May 16 '17 at 20:42
0

In a strict interpretation of your question description, you only want trailing spaces removed, not leading spaces and not other whitespace characters like tabs (\t).

Also, a strict trimming function will only remove double-quotes if both a leading and a trailing pair is found, and only one such set.

If double-quotes are present, trailing spaces inside the double-quotes should also be removed.

To accomplish all that, strictly, in a single regex operation, do this:

str = str.replaceFirst("^(\"?)(.*?) *\\1 *$", "$2");

This regex uses the ^ and $ anchors to ensure it only matches against the entire string.

The leading " is optional, and matched as capture group 1, if present. The trailing " is only matched if leading " was matched, and leading " is only matched if trailing " is matched. This is done using a \1 backreference to the optional leading ". If matched, they will be removed from the result.

No leading spaces are removed, but any trailing spaces before and/or after the optional trailing " are removed.

Anything not removed is captured in group 2, and retained in the replacement string.

Andreas
  • 154,647
  • 11
  • 152
  • 247