-3

Possible Duplicate:
Query about the trim() method in Java

I am parsing a site's usernames and other information, and each one has a bunch of spaces after it (but spaces in between the words). For example: "Bob the Builder " or "Sam the welder ". The numbers of spaces vary from name to name. I figured I'd just use .trim(), since I've used this before. However, it's giving me trouble. My code looks like this:

for (int i = 0; i < splitSource3.size(); i++) {
            splitSource3.set(i, splitSource3.get(i).trim());
}

The result is just the same; no spaces are removed at the end. Thank you in advance for your excellent answers!

UPDATE:

The full code is a bit more complicated, since there are HTML tags that are parsed out first. It goes exactly like this:

for (String s : splitSource2) {
        if (s.length() > "<td class=\"dddefault\">".length() && s.substring(0, "<td class=\"dddefault\">".length()).equals("<td class=\"dddefault\">")) {
                splitSource3.add(s.substring("<td class=\"dddefault\">".length()));
        }
}

System.out.println("\n");
    for (int i = 0; i < splitSource3.size(); i++) {
            splitSource3.set(i, splitSource3.get(i).substring(0, splitSource3.get(i).length() - 5));
            splitSource3.set(i, splitSource3.get(i).trim());
            System.out.println(i + ": " + splitSource3.get(i));
    }
}

UPDATE:

Calm down. I never said the fault lay with Java, and I never said it was a bug or broken or anything. I simply said I was having trouble with it and posted my code for you to collaborate on and help solve my issue. Note the phrase "my issue" and not "java's issue". I have actually had the code printing out

System.out.println(i + ": " + splitSource3.get(i) + "*");

in a for each loop afterward.

This is how I knew I had a problem. By the way, the problem has still not been fixed.

UPDATE:

Sample output (minus single quotes):

'0: Olin D. Kirkland                                          '
'1: Sophomore                                          '
'2: Someplace, Virginia  12345<br />VA SomeCity<br />'
'3: Undergraduate                                          '

EDIT the OP rephrased his question at Query about the trim() method in Java, where the issue was found to be Unicode whitespace characters which are not matched by String.trim().

Community
  • 1
  • 1
Olin Kirkland
  • 548
  • 4
  • 23
  • 9
    You have an extra bracket. Is that your actual code? If not can you post exactly what you have. – Ash Burlaczenko Sep 09 '12 at 19:31
  • 6
    trim() isn't broken. Have you printed out before-and-after results for using it? Have you read the documentation carefully to see if it is documented to actually do what you need? – bmargulies Sep 09 '12 at 19:32
  • sorry about the bracket. i was playing around with the code and rewrote what i had before in the description. – Olin Kirkland Sep 09 '12 at 19:36
  • Did you print the values to check before as after as was suggested? – Peter Svensson Sep 09 '12 at 19:37
  • Of course I did. That's how I knew it wasn't working properly. – Olin Kirkland Sep 09 '12 at 19:40
  • 1
    Please show us an example of a value in the splitSource3 list and the output to the console for that value. – Dan D. Sep 09 '12 at 19:48
  • I think the first place to start in to refactor what you have into more readable code. If you want to use something more than once store that in a variable. i.e you are constantly getting `splitSource3.get(i)` this could easily been be stored in something and the being of the for and if required re-assigned at the end. – Ash Burlaczenko Sep 09 '12 at 19:49
  • is splitSource3 is a list of string? – gigadot Sep 09 '12 at 19:50
  • splitSource3 is an ArrayList. – Olin Kirkland Sep 09 '12 at 20:02
  • I suggest to print the strings like this: System.out.println(i + ": '" + splitSource3.get(i) + "'"); //note the apostrophes This way you'll know whether it's really the trim that won't work. – Tamas Rev Sep 09 '12 at 20:07
  • 3
    Ask yourself, what are the chances that you've found an undocumented bug vs. that you are misunderstanding the use of the tool or have an undiscovered bug in your code? We have all used `trim()` for years without problem. If it were possible, I'd love to wager a large amount of money that the fault here is not with Java. – Hovercraft Full Of Eels Sep 09 '12 at 22:18
  • As per request, I've posted sample outputs. – Olin Kirkland Sep 09 '12 at 22:34
  • 2
    @OlinKirkland I'd consider removing or rephrasing the comment directed at Hovercraft; his point is valid. – Dave Newton Sep 09 '12 at 22:34
  • @DaveNewton, I respectfully disagree. I don't think he read my question thoroughly because I never claimed that there was a problem with java. I *know* I did something wrong, which is why I posted my question in the first place. Please understand where I'm coming from. If you believe his point is valid, please reiterate his point and explain how it is valid, because I may be a little slow to realize the validity. – Olin Kirkland Sep 09 '12 at 22:38
  • 1
    @OlinKirkland Your subject said "trim() not working", before I edited it--IMO you caught grief for that, and it's reasonable that you would. Even assuming you *had* a reasonable title before I edited it, "please be helpful or go away" is hardly appropriate. – Dave Newton Sep 09 '12 at 22:46
  • @DaveNewton Thank you for editing my title! I'm truly thankful. I did not mean to imply that java was inherently not working in my case. I was simply stating that the trim function was not working - the way I was expecting it to. Though 'please be helpful or go away' was a sincere request and I do not plan on removing it. Do you have any suggestions as to my question, though? – Olin Kirkland Sep 09 '12 at 22:48
  • @OlinKirkland Nope; asking a prolific and bright SOer to go away because you don't like what they said is kind of a downer. – Dave Newton Sep 09 '12 at 22:55
  • 1
    @DaveNewton I didn't ask him to go away. I asked him to choose between being helpful or going away. He apparently chose the latter. I didn't get on SO to get in arguments with disgruntled individuals, I came for answers. I hoped you'd understand that. Though I wish you my deepest condolences if you found my prose inconvenient or inappropriate. Also, that semicolon should be a comma. <3 – Olin Kirkland Sep 09 '12 at 22:59
  • 1
    @OlinKirkland Your comment regarding the semicolon is incorrect. Semicolons may be used to separate independent clauses not connected with a coordinating conjunction. I find your prose stilted and verbose, not inconvenient. – Dave Newton Sep 09 '12 at 23:09
  • I love you. Let the mods know they can remove this question (I haven't really figured out how). Reposting question. Also, how is "Nope" an independent clause? – Olin Kirkland Sep 09 '12 at 23:16
  • 1
    @Olin: we're digressing here. Please post proof of your assertion with an [sscce](http://sscce.org). Give us something we can test else this exercise is nothing but a waste of electrons and time. – Hovercraft Full Of Eels Sep 09 '12 at 23:17
  • http://stackoverflow.com/questions/12343765/query-about-the-trim-method-in-java – Olin Kirkland Sep 09 '12 at 23:18
  • @OlinKirkland You're suggesting the word "no" can't be an independent clause? Anything that can stand alone as a sentence may be an independent clause, and a single-word answer is a valid sentence. The idea of not checking the assumptions you're making about your data is almost as silly as the idea that "no" can't stand on its own. – Dave Newton Sep 10 '12 at 17:25
  • Dave, I'm getting tired of this back and forth. But since you insist, http://en.wikipedia.org/wiki/Independent_clause 'No' can't stand on it's own. Independent clauses include both a subject and a verb. – Olin Kirkland Sep 10 '12 at 21:51
  • @OlinKirkland Oh, you're specifically talking about the phrase "independent clause", a different issue than whether or not there can be a semi-colon in my sentence (there can be). And yes, a sentence can consist of a single word. And it's "its". Tired of the back-and-forth? Stop. – Dave Newton Sep 11 '12 at 19:46
  • (@OlinKirkland Btw, [this](http://www-personal.umich.edu/~jlawler/IELL-Punctuation.pdf), and [this](http://en.wikipedia.org/wiki/Yes_and_no#English_grammar_classification). When it comes to grammar, a short page on Wikipedia will rarely be enough to determine whether or not a usage is considered correct.) – Dave Newton Sep 11 '12 at 20:49

2 Answers2

0

Works without a problem for me.

Here your code a bit refactored and (maybe) better readable:

final String openingTag = "<td class=\"dddefault\">";
final String closingTag = "</td>";
List<String> splitSource2 = new ArrayList<String>();
splitSource2.add(openingTag + "Bob the Builder " + closingTag);
splitSource2.add(openingTag + "Sam the welder " + closingTag);
for (String string : splitSource2) {
    System.out.println("|" + string + "|");
}
List<String> splitSource3 = new ArrayList<String>();
for (String s : splitSource2) {
    if (s.length() > openingTag.length() && s.startsWith(openingTag)) {
        String nameWithoutOpeningTag = s.substring(openingTag.length());
        splitSource3.add(nameWithoutOpeningTag);
    }
}

System.out.println("\n");
for (int i = 0; i < splitSource3.size(); i++) {
    String name = splitSource3.get(i);
    int closingTagBegin = splitSource3.get(i).length() - closingTag.length();
    String nameWithoutClosingTag = name.substring(0, closingTagBegin);
    String nameTrimmed = nameWithoutClosingTag.trim();
    splitSource3.set(i, nameTrimmed);
    System.out.println("|" + splitSource3.get(i) + "|");
}

I know that's not a real answer, but i cannot post comments and this code as a comment wouldn't fit, so I made it an answer, so that Olin Kirkland can check his code.

Christoph Schubert
  • 1,089
  • 1
  • 8
  • 16
  • See, here's the thing. I'm not grabbing "Bob the builder" and "Sam the Welder" from hard code. I'm grabbing it from HTML, which is why this is being difficult (I think). I don't know what your answer is supposed to be. What are these openingTag and closingTag variables?? I grab the stuff from the SITE like that and am trying to get RID of it...? – Olin Kirkland Sep 09 '12 at 22:26
  • I only created the tags to make it more readable. But nevermind, you got your answer in the other post^^ The purpose of my answer was only to make it more readable and executable. – Christoph Schubert Sep 10 '12 at 08:11
0

It just occurred to me that I used to have this sort of issue when I worked on a screen-scraping project. The key is that sometimes the downloaded HTML sources contain non-printable characters which are non-whitespace characters too. These are very difficult to copy-paste to a browser. I assume that this could happened to you.

If my assumption is correct then you've got two choices:

  1. Use a binary reader and figure out what those characters are - and delete them with String.replace(); E.g.:

    private static void cutCharacters(String fromHtml) {
        String result = fromHtml;
        char[] problematicCharacters = {'\000', '\001', '\003'}; //this could be a private static final constant too
        for (char ch : problematicCharacters) {
            result = result.replace(ch, ""); //I know, it's dirty to modify an input parameter. But it will do as an example
        }
        return result;
    }
    
  2. If you find some sort of reoccurring pattern in the HTML to be parsed then you can use regexes and substrings to cut the unwanted parts. E.g.:

    private String getImportantParts(String fromHtml) {
        Pattern p = Pattern.compile("(\\w*\\s*)"); //this could be a private static final constant as well.
        Matcher m = p.matcher(fromHtml);
        StringBuilder buff = new StringBuilder();
        while (m.find()) {
            buff.append(m.group(1));
        }
        return buff.toString().trim();
    }
    
Tamas Rev
  • 7,008
  • 5
  • 32
  • 49
  • The problem was Unicode "non-breaking space" characters, and was answered at http://stackoverflow.com/questions/12343765/query-about-the-trim-method-in-java – Alnitak Sep 26 '12 at 10:42