2

I want to compare if a provided string starts with any of the strings in an array. The easiest solution being:

String b = ...;
boolean matched = false;
for (String a : array) {
  if (b.startsWith(a))
    match = true;
}

However, intuitively, I want to use something like a trie to get better efficiency since the array of strings might grow to be pretty big and I need to run these matches fast. I can guarantee that these strings are all alphabetical. I can also guarantee that all strings in the array are length 2 or less. What's the best way to implement this trie-like structure in Java? I couldn't find any Java-based libraries that does this.

Thanks!

Jin
  • 6,055
  • 2
  • 39
  • 72
  • You might look at http://stackoverflow.com/questions/623892/where-do-i-find-a-standard-trie-based-map-implementation-in-java or this https://forums.oracle.com/forums/thread.jspa?messageID=8787521 – CPerkins Apr 16 '13 at 18:43

3 Answers3

5

If you truly have enough start strings that it becomes a bottleneck, a trie might indeed help.

This question has been asked and answered on this very site: Where do I find a standard Trie based map implementation in Java?

And this was the answer: https://forums.oracle.com/forums/thread.jspa?messageID=8787521

Community
  • 1
  • 1
CPerkins
  • 8,968
  • 3
  • 34
  • 47
  • Thanks for the links, I remember looking at the post and not seeing a standard implementation. I ended up just using a regular TreeSet and will profile the performance first before getting other libraries. Thanks again! – Jin Apr 16 '13 at 19:44
2

I want to compare if a provided string starts with any of the strings in an array.

Well - you can certainly improve on your current solution :

static boolean startsAny(final String b) {
    for (String a : array) {
        if (b.startsWith(a)) {
            return true;
        }
    }
    return false
}

You can use String#matches with a regular expression, but I'm not sure this is more efficient. Have you profiled the code and identified this as a bottleneck?

Amir Afghani
  • 37,814
  • 16
  • 84
  • 124
  • 1
    Good question about profiling first, and this is faster (but won't compile - it needs a final `return false`). – CPerkins Apr 16 '13 at 18:17
  • Good suggestion on profiling.. It's possible that I am optimizing too early, I'll run some tests and see. – Jin Apr 16 '13 at 19:41
2

A simple solution is to insert the strings into a Set<String> and then perform two lookups on it, one with the first character of b and then if not match with the first two characters of b.

For example,

class StartsWithAny {
    private Set<String> set;

    public StartsWithAny(String[] array) {
        set = new HashSet<String>();
        for (String a : array) {
            set.add(a);
        }
    }

    // returns true if b starts with any strings contained in array
    // with the condition that b.length() <= 2
    public boolean startsWithAny(final String b) {
        if (b.length() > 0 && set.contains(b.substring(0, 1))) {
            return true;
        }

        if (b.length() > 1 && set.contains(b.substring(0, 2))) {
            return true;
        }

        return false;
    }
}

A variation on this would be to utilize two separate Sets, one for the single character lookup and one for the two character lookup which would improve performance a bit.

An alternative but similar approach would be to implement a binary search algorithm that would operate on the sorted array and perform a similar function.

Ed Plese
  • 1,568
  • 10
  • 14
  • I actually ended up simply using a TreeSet and doing a contains() lookup, not sure if that's significantly slower/faster than doing separate lookups, but it's only 2 characters, so it shouldn't be that much different I think.. Thanks for the suggestion! – Jin Apr 16 '13 at 19:43