10

If I am looking for a particular word inside a string, for example, in the string "how are you" I am looking for "are". Would a regular indexOf() work faster and better or a Regex match()

String testStr = "how are you";
String lookUp = "are";

//METHOD1
if (testStr.indexOf(lookUp) != -1)
{
 System.out.println("Found!");
}

//OR
//METHOD 2
if (testStr.match(".*"+lookUp+".*"))
{
 System.out.println("Found!");
}

Which of the two methods above is a better way of looking for a string inside another string? Or is there a much better alternative?

  • Ivard
Raedwald
  • 46,613
  • 43
  • 151
  • 237
topgun_ivard
  • 8,376
  • 10
  • 38
  • 45
  • Isn't this an exact duplicate of this: http://stackoverflow.com/q/3876246/450398 – Grodriguez Oct 07 '10 at 06:46
  • In the j2se 1.6 the new method is String.maches instead of http://download.oracle.com/javase/1.4.2/docs/api/java/lang/String.html#matches(java.lang.String) – A_Var Oct 07 '10 at 17:17

9 Answers9

18

If you don't care whether it's actually the entire word you're matching, then indexOf() will be a lot faster.

If, on the other hand, you need to be able to differentiate between are, harebrained, aren't etc., then you need a regex: \bare\b will only match are as an entire word (\\bare\\b in Java).

\b is a word boundary anchor, and it matches the empty space between an alphanumeric character (letter, digit, or underscore) and a non-alphanumeric character.

Caveat: This also means that if your search term isn't actually a word (let's say you're looking for ###), then these word boundary anchors will only match in a string like aaa###zzz, but not in +++###+++.

Further caveat: Java has by default a limited worldview on what constitutes an alphanumeric character. Only ASCII letters/digits (plus the underscore) count here, so word boundary anchors will fail on words like élève, relevé or ärgern. Read more about this (and how to solve this problem) here.

Community
  • 1
  • 1
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
1

Method one should be faster because it has lesser overhead. if it is about performance in searching in huge files a specialized method like boyer moore pattern matching could lead to further improvements.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
stacker
  • 68,052
  • 28
  • 140
  • 210
  • For so e reason the link isn't displayed http://en.wikipedia.org/wiki/Boyer–Moore_string_search_algorithm – stacker Oct 07 '10 at 06:34
  • The dash in `Boyer-Moore` was really an en-dash (`U+2013`). I don't know offhand if that's legal in a URL, but SO doesn't seem to like it. – Alan Moore Oct 07 '10 at 06:47
1

If you are looking for a fixed string, not a pattern, as in the example in your question, indexOf will be better (simpler) and faster, since it does not need to use regular expressions.

Also, if the string you are searching for does contain characters that have a special meaning in regular expressions, with indexOf you don't need to worry about escaping these characters.

In general, use indexOf where possible, and match for pattern matching, where indexOf cannot do what you need.

Grodriguez
  • 21,501
  • 10
  • 63
  • 107
0

I use it:

public boolean searchStr(String search, String what) {
    if(!search.replaceAll(what,"_").equals(search)) {
        return true;
    }
    return false;
}

Example use:

String s = "abc";
String w = "bc";
if(searchStr(s,w)) { 
    //this returns true
}
s="qwe";
w="asd";
if(searchStr(s,w)) { 
    //this returns false
}
alestanis
  • 21,519
  • 4
  • 48
  • 67
barwnikk
  • 950
  • 8
  • 14
  • 1
    Welcome on SO, here, it is a good practice to explain why to use your solution and not just how. That will make your answer more valuable and help further reader to have a better understanding of how you do it. I also suggest that you have a look on our FAQ : http://stackoverflow.com/faq. – ForceMagic Oct 29 '12 at 20:50
0

If you are looking up one string inside another you should be using indexOf or contains method. Example: See if "foo" is present in a string.

But if you are looking for a pattern use the match method.
Example: See if "foo" is present at the beginning/end of the string. Or see if it's present as a whole word.

Using the match method for simple string searching is not efficient because of the regex engine overhead.

codaddict
  • 445,704
  • 82
  • 492
  • 529
0

The first method is faster and since it's not a complex expressions there is no reason to use regex here.

Emil
  • 13,577
  • 18
  • 69
  • 108
0

of course indexOf()is better than match(). one 'match()' consists of many compares: a==a,r==r ,e==e ; at the same time,you append wildcards which would be divided into many cases:

  1. ?are
    ??are
    ???are
    ????are
    ........ are are? are?? are???

until it's as long as the original strings.

shenju
  • 11
  • 1
0

Your question practically answers itself; if you have to ask whether regex is the better choice, it almost certainly isn't. Also, when you're choosing between regex and non-regex solutions, performance should never be your primary criterion. Wait until you've got some working code and profile it.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
0

A better approach to compare the both versions is to analyze the source code of indexOf method and the regex.matches methods itself, calculating runtime of both the algorithm implementations in Big_O_notation and comparing their best, average and worst cases (charsequence found at start, middle or end of the string respectively). The source code goes here indexOf_source and here regex.matches. We need to do a run-time analysis of both to see what it is exactly doing. Hectic task but it's the only way to make a true comparison, the rest of them being only assumptions. Good question though.

A_Var
  • 1,056
  • 1
  • 13
  • 23