-2

I am working on an java and MySQL based application and i have task to check whether set of adjacent words present in a given string contains a lot of words. For example

The string is

" java is a programing language .java is robust and powerful and it is also platform independent. "

I want to check whether the substrings

"programing language"
"platform independent"
and
"robust and powerful"
present in the above string. The substring also must match even if more than one white space occur between the words.
Geordy James
  • 2,358
  • 3
  • 25
  • 34
  • 3
    Do you know the sub strings to search before hand ? – geekprogrammer Dec 17 '15 at 12:45
  • 1
    "*check whether set of adjacent words present in a given string contains a lot of words.*" What? – user1803551 Dec 17 '15 at 12:46
  • You are interested in phrases rather than words. This is not as simple as finding words. This link can be of some use to you. http://stackoverflow.com/questions/1643616/algorithms-to-detect-phrases-and-keywords-from-text – bluelurker Dec 17 '15 at 12:53
  • yes I tried it by using indexOf method in java which return -1 if pattern does not found. But main problem of that approach is it is not efficient when you want to check more than 100 patterns in 100 sentences , That is why I ask this question to find out most efficient method for pattern matching in java. – Geordy James Dec 18 '15 at 13:33

2 Answers2

1

You could try something like:

String string = " java is a programing language .java is robust and powerful and it is also platform independent. ";

String subS1 = "programing language";
subS1 = subS1.replace(" ", "\\s+");
Pattern p1 = Pattern.compile(subS1);
Matcher match1 = string.matcher(subS1);

String subS2 = "platform independent";
subS2 = subS2.replace(" ", "\\s+");
Pattern p2 = Pattern.compile(subS2);
Matcher match2 = string.matcher(subS2);

String subS3 = "robust and powerful";
subS3 = subS3.replace(" ", "\\s+");
Pattern p3 = Pattern.compile(subS3);
Matcher match3 = string.matcher(subS3);

if (match1.find() && match2.find() && match3.find()) {
  // Whatever you like
}

You should replace all spaces in the substrings with '\s+', so it will also find "programing [loads of whitespaces] language". Then compile the pattern you want to find and match the string and the substring. Repeat for each substring. Lastly, test whether the matchers found anything.

Some Notes

  • Didn't test it, but it should give an idea of how I should do it.
  • Also, this should give an answer to you very specific question. This method should not be used when you have a large amount of substrings to check...
  • Please post some code you were already working with, because now I feel like making your homework...
Nander Speerstra
  • 1,496
  • 6
  • 24
  • 29
  • Can you say which method should I use if I have a large amount of substring to check. I have to analyse more than 100 tuple sets in employee table and I have about 100 pattern to check in these 100 tuples . So is the above method is efficient or can you suggest some better ways. – Geordy James Dec 18 '15 at 13:25
  • @GeordyJames, you better start a new question with exact information about what you need, since this new question is quite different from the one I answered. What tuple sets? Do you have some code you already started with? In the new question, please give examples of (i) a substring, (ii) a tuple set, (iii) a pattern, (iv) what your program should do (the result) and (v) what you already coded. I do not think that I, and the others that commented, understand your question well enough to give a useful answer.. – Nander Speerstra Dec 22 '15 at 12:22
0

You can try this one for the first part of your question, I don't if I did not miss understand your question.

    String str  = " java is a programing language .java is robust and powerful and it is also platform independent. ";
    if (str.contains("programing language")) {
         System.out.println("programing language");
    }
    if (str.contains("platform independent")) {
         System.out.println("platform independent");
    }
    if (str.contains("robust and powerful")) {
        System.out.println("robust and powerful");
    }

I don't know what do you mean by this: The substring also must match even if more than one white space occur between the words.

Bahramdun Adil
  • 5,907
  • 7
  • 35
  • 68