40

For example I'm extracting a text String from a text file and I need those words to form an array. However, when I do all that some words end with comma (,) or a full stop (.) or even have brackets attached to them (which is all perfectly normal).

What I want to do is to get rid of those characters. I've been trying to do that using those predefined String methods in Java but I just can't get around it.

gvlasov
  • 18,638
  • 21
  • 74
  • 110
Slavisa Perisic
  • 1,110
  • 3
  • 18
  • 32

7 Answers7

181

Reassign the variable to a substring:

s = s.substring(0, s.length() - 1)

Also an alternative way of solving your problem: you might also want to consider using a StringTokenizer to read the file and set the delimiters to be the characters you don't want to be part of words.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • that's exactly what I did and it worked :) BTW I forgot to mention that the use of StringTokenizer class was strictly forbidden by my mentor. – Slavisa Perisic Dec 25 '09 at 23:27
  • This micro-benchmark suggests that substring() may be faster than regex in this context: http://groups.google.com/group/comp.lang.java.programmer/msg/cf4e57a09eb8ff7c – trashgod Dec 25 '09 at 23:45
  • 2
    @trashgod - you don't need a microbenchmark to tell you that. Just a tiny amount of common sense ... and looking at the source code of `String.substring()`. – Stephen C Dec 26 '09 at 01:12
17

Use:

String str = "whatever";
str = str.replaceAll("[,.]", "");

replaceAll takes a regular expression. This:

[,.]

...looks for each comma and/or period.

Pshemo
  • 122,468
  • 25
  • 185
  • 269
OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
7

To remove the last character do as Mark Byers said

s = s.substring(0, s.length() - 1);

Additionally, another way to remove the characters you don't want would be to use the .replace(oldCharacter, newCharacter) method.

as in:

s = s.replace(",","");

and

s = s.replace(".","");
Community
  • 1
  • 1
Tom Neyland
  • 6,860
  • 2
  • 36
  • 52
4

You can't modify a String in Java. They are immutable. All you can do is create a new string that is substring of the old string, minus the last character.

In some cases a StringBuffer might help you instead.

bmargulies
  • 97,814
  • 39
  • 186
  • 310
  • Thank you. I managed to do something like this: [code] for (int i = 0; i < textArray.length; i++) { if ((textArray[i].endsWith(",")) || textArray[i].endsWith(".")) textArray[i].substring(textArray[i].indexOf(textArray[i].length()-1)); System.out.println(textArray[i].toLowerCase()); } [/code] – Slavisa Perisic Dec 25 '09 at 23:25
3

The best method is what Mark Byers explains:

s = s.substring(0, s.length() - 1)

For example, if we want to replace \ to space " " with ReplaceAll, it doesn't work fine

String.replaceAll("\\", "");

or

String.replaceAll("\\$", "");   //if it is a path
AlexGach
  • 102
  • 4
0

Note that the word boundaries also depend on the Locale. I think the best way to do it using standard java.text.BreakIterator. Here is an example from the java.sun.com tutorial.

import java.text.BreakIterator;
import java.util.Locale;

public static void main(String[] args) {
    String text = "\n" +
            "\n" +
            "For example I'm extracting a text String from a text file and I need those words to form an array. However, when I do all that some words end with comma (,) or a full stop (.) or even have brackets attached to them (which is all perfectly normal).\n" +
            "\n" +
            "What I want to do is to get rid of those characters. I've been trying to do that using those predefined String methods in Java but I just can't get around it.\n" +
            "\n" +
            "Every help appreciated. Thanx";
    BreakIterator wordIterator = BreakIterator.getWordInstance(Locale.getDefault());
    extractWords(text, wordIterator);
}

static void extractWords(String target, BreakIterator wordIterator) {
    wordIterator.setText(target);
    int start = wordIterator.first();
    int end = wordIterator.next();

    while (end != BreakIterator.DONE) {
        String word = target.substring(start, end);
        if (Character.isLetterOrDigit(word.charAt(0))) {
            System.out.println(word);
        }
        start = end;
        end = wordIterator.next();
    }
}

Source: http://java.sun.com/docs/books/tutorial/i18n/text/word.html

Chandra Patni
  • 17,347
  • 10
  • 55
  • 65
0

You can use replaceAll() method :

String.replaceAll(",", "");
String.replaceAll("\\.", "");
String.replaceAll("\\(", "");

etc..

fastcodejava
  • 39,895
  • 28
  • 133
  • 186