1

I have a problem: I need to find first occurrence of any symbol from string s2 (or array of char) in string s1.

Is there standard function for this purpose? If there isn't, whats the good implementation for this problem? (Of course I can run indexOf for every char from my s2, but this does't seem like a good algorithm, because if only the last symbol occurs in s1, we must run through s1 |s2|-1 times before I get an answer).

Thank you very much!

Gabe
  • 84,912
  • 12
  • 139
  • 238
user197284
  • 281
  • 1
  • 3
  • 7
  • Nope; the Java String class has nothing like this. Can you imagine a better algorithm than the one you describe? – maerics Nov 14 '11 at 21:18
  • 2
    I would pack the character of string s2 into a [regular expression](http://download.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html) e.g. 'a|b|c|d' (special characters must be escaped) and then use Matcher.find(..) to get the first occurrence. – Andre Holzner Nov 14 '11 at 21:20
  • @maerics I doubt it's faster. Using a regular expression like that you only need to iterate once through the string. – Paul Nov 14 '11 at 21:23
  • Although you could also just iterate through it manually checking if each character is in the array of char (s2) which would be even faster. He'll have to be doing this millions of times though for any of these to be noticably different. – Paul Nov 14 '11 at 21:24
  • I've posted a simplistic functioning answer which shows how it might be done using regex as @AndreHolzner suggests. – Tim Bender Nov 14 '11 at 21:31

4 Answers4

5

Put all characters from s2 into a constant-time lookup data structure (e.g. HashSet). Iterate over each character in s1 and see if your data structure contains that character.

Roughly (untested):

public int indexOfFirstContainedCharacter(String s1, String s2) {
  Set<Character> set = new HashSet<Character>();
  for (int i=0; i<s2.length; i++) {
    set.add(s2.charAt(i)); // Build a constant-time lookup table.
  }
  for (int i=0; i<s1.length; i++) {
    if (set.contains(s1.charAt(i)) {
      return i; // Found a character in s1 also in s2.
    }
  }
  return -1; // No matches.
}

This algorithm is O(n) as opposed to O(n^2) in the algorithm you describe.

maerics
  • 151,642
  • 46
  • 269
  • 291
4

Using regex:

   public static void main(final String[] args) {
      final String s1 = "Hello World";
      final String s2 = "log";

      final Pattern pattern = Pattern.compile("[" + Pattern.quote(s2) + "]");
      final Matcher matcher = pattern.matcher(s1);
      if (matcher.find()) {
         System.out.println(matcher.group());
      }
   }
Tim Bender
  • 20,112
  • 2
  • 49
  • 58
  • you should escape special characters before compiling the pattern, e.g. using `Pattern.quote(..)`, see http://stackoverflow.com/questions/60160/how-to-escape-text-for-regular-expression-in-java – Andre Holzner Nov 14 '11 at 22:19
  • Yeah, this answer probably needs a fair amount of beefing up to be robust. A string that contains a hyphen would also cause some issues I think. – Tim Bender Nov 14 '11 at 22:41
3

What you are looking for is indexOfAny from Apache StringUtils.

It looks like the implementation is:

 public static int indexOfAny(String str, char[] searchChars) {
   if (isEmpty(str) || ArrayUtils.isEmpty(searchChars)) {
     return -1;
   }
   for (int i = 0; i < str.length(); i++) {
     char ch = str.charAt(i);
       for (int j = 0; j < searchChars.length; j++) {
         if (searchChars[j] == ch) {
           return i;
         }
       }
     }
    return -1;
  }
Casey
  • 12,070
  • 18
  • 71
  • 107
  • Linear search? Ow, better make sure that `searchChars` is never large. I thought Apache would do something a bit more sophisticated. – Daniel Fischer Nov 14 '11 at 21:43
3

What is meant by symbol in this context? If it's just a 16-bit Java char, it's easy. Make a lookup table (array) for all possible values, indicating whether they appear in s2. Then step through s1 until either you've found a symbol from s2 or you've reached the end of s1. If a symbol is a Unicode code-point, it's more complicated, but the above gives a method to find out where you need to take a closer look.

Daniel Fischer
  • 181,706
  • 17
  • 308
  • 431
  • you mean, you would initialize an array of 2^16 booleans where only the elements corresponding to an character in s2 are set to true ? (interesting idea; I think a HashSet could perform nearly as well) – Andre Holzner Nov 14 '11 at 21:27
  • @AndreHolzner Yes, that's the idea. 2^16 is small enough, so there's no need to use fancy stuff like HashSets, the simple array will do. I'm not familiar enough with the JVM, but in C I would be quite confident that it's also faster, because it fits comfortably in L2 (given normal hardware). Were `char`s 20 bits or more, the array would become too big, so then definitely some kind of set. – Daniel Fischer Nov 14 '11 at 21:38