2

I am writing a code to spot country names in the text. I am using a dictionary with names of countries say India, America, Sri Lanka, .... I am currently using text.contains(key) with key from the dictionary. However, this returns true even for a string like Indiana. I tried putting the words of the sentence in an array and then doing the contains, similar approach can be considered with equals but they are really slow. Is there any other faster way you could think of?

Tazo
  • 197
  • 2
  • 3
  • 16
  • 4
    Why don't you post the relevant part of the code you consider slow? It will be easier to understand what you are doing and to help you improve it. – Bruno Reis Apr 03 '13 at 05:39
  • 1
    `contains()` returns true for partial matches as you may have found out. But how have you determined that `equals()` is slower than `contains()`? May be I didn't understand your question well. – asgs Apr 03 '13 at 05:41
  • If I have to check against each word,then I will have to split the text (ie: My input sentence) and then match it against each key in the dictionary.This is what I meant by being slow.Sorry if I could convey it correctly!On the other hand,I do not need to split the input text in case I use contains directly,however it also gives partial matches. – Tazo Apr 03 '13 at 05:46
  • possible duplicate of [Search for a word in a String](http://stackoverflow.com/questions/3879160/search-for-a-word-in-a-string) – Raedwald May 01 '14 at 07:48

3 Answers3

9

Try to use word boundary class \b

s.matches(".*\\b" + key + "\\b.*")
Arun P Johny
  • 384,651
  • 66
  • 527
  • 531
1

Maybe you should be using some text processing library.

Here is a regex solution:

import java.util.regex.*;
import static java.lang.System.*;
public class SO {
    public static void main(String[] args) {
        String[] dict={"india","america"};
        String patStr=".*\\b(" + combine(dict,"|") + ")\\b.*";
        out.println("pattern: "+patStr+"\n");
        Pattern pat=Pattern.compile(patStr);

        String input1="hello world india indiana";
        out.println(input1+"\t"+pat.matcher(input1).matches());

        String input2="hello world america americana";
        out.println(input2+"\t"+pat.matcher(input2).matches());

        String input3="hello world indiana amercana";
        out.println(input3+"\t"+pat.matcher(input3).matches());
    }
    static String combine(String[] s, String glue){
      int k=s.length;
      if (k==0) return null;
      StringBuilder out=new StringBuilder();
      out.append(s[0]);
      for (int x=1;x<k;++x)
        out.append(glue).append(s[x]);
      return out.toString();
    }
}

Output:

pattern: .*\b(india|america)\b.*

hello world india indiana       true
hello world america americana   true
hello world indiana amercana    false
Navin
  • 3,681
  • 3
  • 28
  • 52
0

contains() should have worked. You can also try String.indexOf(String). If it returns anything other than -1, that query string exists in the said String, otherwise not.

Pradeep Pati
  • 5,779
  • 3
  • 29
  • 43