Query about the trim() method in Java

Question

I asked a question earlier but met harsh criticism, so here I pose it again. Simpler, and rephrased to appeal to those who may have been concerned about the way I asked it before.

BACKGROUND I am parsing some HTML for information. I have isolated everything in a series of lines but the content I wish to grab and a bunch of spaces after it. To get rid of the spaces, I opted to use trim(), but I have been having trouble. The last few lines of my code are tests:

System.out.println("'" + someString + "'\n'" + someString.trim() + "'");

The results were:

'Sophomore                                          '
'Sophomore                                          '

I was worried I might have a problem with the way I was calling trim(), since we all make mistakes from time to time, so I tested it like this:

String s = "   hello         ";
System.out.println("'" + s+ "'\n'" + s.trim() + "'");

The results were:

'  hello     '
'hello'

MY QUESTION What am I doing wrong? What I want is to get 'Sophomore', not 'Sophomore '

I look forward to your excellent answers (thanks in advance!).

I suppose it is. But this is a standalone. I'm not trolling or anything. I may eventually post a video of my IDE if this keeps getting downvoted. — Olin Kirkland, Sep 09 '12 at 23:22
It's tagged regex because I figured someone might post an answer with a regex solution if trim() is not an option. — Olin Kirkland, Sep 09 '12 at 23:23
@OlinKirkland that's not a valid reason to tag the question. — Alnitak, Sep 09 '12 at 23:23
I think the question is ok and +1 for it. They can say whatever they want. :) — , Sep 09 '12 at 23:25
trim() is for removing white spaces. You are getting different kind of output that is because you are testing in a wrong way. look at my answer — FirmView, Sep 09 '12 at 23:26
@DavorLozic I really appreciate your thumbs up. Got any ideas how to resolve this, though? :L — Olin Kirkland, Sep 09 '12 at 23:26
Also why is "What's your question?" thumbed up so much. Did people just not read the question...? — Olin Kirkland, Sep 09 '12 at 23:27
I think you need to take a look at what encoding your string is, which by the way it's impossible to know based on this information only — Alexander, Sep 09 '12 at 23:30
@OlinKirkland Maybe they don't understand English? ^^ No, I don't know the answer. I'm the .NET kind of guy but with this title you've got my attention. :) — , Sep 09 '12 at 23:31
String.trim() removes characters less that or equal to space i.e. (char) 32. As such it doesn't remove all white spaces, but it does remove some control characters which are not white space. — Peter Lawrey, Sep 10 '12 at 07:58

Alnitak · Accepted Answer · 2012-09-26T10:47:32.513

3

String.trim() specifically only removes characters before the first character whose code exceeds \u0020, and after the last such character.

This is insufficient to remove all possible white space characters - Unicode defines several more (with code points above \u0020) that will not be matched by .trim().

Perhaps your white space characters aren't the ones you think they are?

EDIT comments revealed that the extra characters were indeed "special" whitespace characters, specifically \u00a0 which is a Unicode "non-breaking space". To replace those with normal spaces, use:

str = str.replace('\u00a0', ' ');

edited Sep 26 '12 at 10:47

answered Sep 09 '12 at 23:27

Alnitak

334,560
70
407
495

THANK YOU. THAT MIGHT BE IT. I've been thinking this for a while. What could they be?? If they aren't spaces, why do they look like them?? – Olin Kirkland Sep 09 '12 at 23:28
Agree. The critical thing he's not showing us is the pre-processed text such as a small test case data that shows the error. 1+ – Hovercraft Full Of Eels Sep 09 '12 at 23:28
@OlinKirkland try looping over the string and using `codePointAt` to find out each characters values. They might be alternate unicode characters, for example. – Alnitak Sep 09 '12 at 23:29
@ Hovercraft, what do you mean by the pre-processed text? The exact copy before I cut out the beginning and end of the string? – Olin Kirkland Sep 09 '12 at 23:29
@Olin: a small bit of text that when processed by the [sscce](http://sscce.org) that you would normally post for a question like this, would reproduce the problem. – Hovercraft Full Of Eels Sep 09 '12 at 23:30
53 6f 70 68 6f 6d 6f 72 65 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 a0 This is the output when I did what muratgu suggested. I think I'm getting a little deep for my expertise. What does this mean, exactly? Do you guys know? – Olin Kirkland Sep 09 '12 at 23:37
The `a0` characters are your problem. They're a Unicode "no-break-space" and as such not recognised by `.trim()`. http://www.fileformat.info/info/unicode/char/a0/index.htm – Alnitak Sep 09 '12 at 23:40
Well, damn. Will I just have to write my own trim() method to take care of just a0 characters? Or better yet, use replace() to replace a0 with " "? How do I apply replace to the character hexes? Stupid college sites not using actual spaces. – Olin Kirkland Sep 09 '12 at 23:41
@OlinKirkland, [Non-breaking space](http://en.wikipedia.org/wiki/Non-breaking_space) – Alexander Sep 09 '12 at 23:42
1

@OlinKirkland you should be able to write a regex (oh, the irony...) to replace `\u0040` with an normal space, and then use `.trim` as before. – Alnitak Sep 09 '12 at 23:42
http://stackoverflow.com/questions/4455218/remove-specific-character-from-a-string-based-on-hex-value-c-sharp – muratgu Sep 09 '12 at 23:42
@muratgu that answer is c#, not Java – Alnitak Sep 09 '12 at 23:43
@OlinKirkland `str = str.replace('\u0040', ' ');` – Alnitak Sep 09 '12 at 23:44

score 1 · Answer 2 · answered Sep 09 '12 at 23:35

1

There must be a non-whitespace character in the source string. Add the following to your code and see what it prints.

for (char ch : someString.toCharArray()) {
     System.out.print(Integer.toHexString(ch) + " ");
}

answered Sep 09 '12 at 23:35

muratgu

7,241
3
24
26

join the conversation on Alnitak's answer. I'm posting subsequent information there. – Olin Kirkland Sep 09 '12 at 23:38

Query about the trim() method in Java

2 Answers2

Linked

Related