1

I have got a method that reads a file, puts each word into an array of strings and then adds each word to a tree. I want to modify it so that the word is not added to the tree if it contains NON English characters eg spanish etc. I though about the 'contains' method but it doesn't work on the array of type String. How would i do it ?

    public void parse(File f) throws Exception {

    Node root = new  Node('+'); //create a root node
    BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(f)));

    String line;
    while((line = br.readLine())!=null){
        String[] words = line.toLowerCase().split(" ");

        for(int i = 0; i < words.length; i++){
            addToTree(words[i], root);
        }
    }//end of while
ciastkoo
  • 92
  • 1
  • 9
  • Can't you use the contains method on the String (words[i]) that you are trying to add to the tree? – Rush Apr 04 '13 at 15:09
  • You can use Regex, that accept only a to Z with -;!,'. – Damian Leszczyński - Vash Apr 04 '13 at 15:10
  • http://stackoverflow.com/questions/2774320/how-to-know-if-a-string-contains-accents this should solve your issue. – Kazekage Gaara Apr 04 '13 at 15:11
  • This question is pretty meaningless unless you define exactly what 'English characters' are. For example, both English and Spanish are based on the Roman alphabet. Are you talking about excluding things like diacritics? – Perception Apr 04 '13 at 15:13
  • http://stackoverflow.com/questions/150033/regular-expression-to-match-non-english-characters could be useful here – Tobias Apr 04 '13 at 15:15

2 Answers2

3

You can use regex for that:

Pattern nonEng = Pattern.compile("[^A-Za-z]");
...
for(int i = 0; i < words.length; i++) {
    if (!pattern.matcher(words[i]).find()) {
        addToTree(words[i], root);
    }
}

This would throw away all words that are not composed entirely of English characters.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
0

if words are composed of letters from [a-zA-Z_0-9]

return !myString.matches("^\\w+$");

if you have special requirements like punctuation marks and other characters, add them as well in the regex. [^\w.,;:'"]

Waqas Memon
  • 1,247
  • 9
  • 22