1

I've got an array of strings and I need to get all strings, that start with some 'prefix'. I wanna use Array.BinarySearch(). Is it possible? And how should I write a comparer if so?

sword
  • 13
  • 4
  • 2
    `BinarySearch` is for finding _one_ item in a _sorted_ array, which means you'll have to pre-sort the array based on that criteria. If that's what you want then it's possible, but using a normal search routing would probably be more appropriate. If you want to find _all_ items that start with a value then `BinarySearch` would not work at all. – D Stanley Nov 19 '14 at 15:29
  • Even if you could pull it off, it would not be more efficient than a linear search. The point of Binary searching is using it for multiple searches on the same set. If you are using it for just one search, just sorting the array would take longer than single linear search. – Rotem Nov 19 '14 at 15:32
  • In a comment on an answer you mention that you have a sorted list of 100k strings, and another (also sorted?) list of 10k prefixes. What is your actual problem that you're trying to solve? http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem – sisve Nov 19 '14 at 15:52
  • @SimonSvensson, I need to get 10 words from 100000 list of sorted words for each prefix from 10000 sorted list of prefixes – sword Nov 19 '14 at 15:56
  • Do any of the prefixes overlap (e.g. does it contain `Foo` and `Foobar`? Is the list of prefixes sorted? – D Stanley Nov 19 '14 at 16:04
  • @DStanley, prefixes are sorted. And they overlap (i.e. 'aaca', 'aaac') – sword Nov 19 '14 at 16:07
  • The solution is to use trie – sword Nov 20 '14 at 12:09

2 Answers2

3

No, you cannot use BinarySearch in this case. You could use Enumerable.Where instead:

Dim query = From str In array Where str.StartsWith("prefix")

or with (ugly in VB.NET) method synatx:

query = array.Where(Function(str) str.StartsWith("prefix"))

Edit: whoops, C#

var query = array.Where(s => s.StartsWith("prefix"));

Use ToArray if you want to create a new filtered array.

Community
  • 1
  • 1
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
  • The thing is that I've got List of 100000 sorted words and another List of 10000 prefixes. So 10000 calls of Where operator are taking too much time. So I've decided to use BinarySearch. Does this decision has any sense? – sword Nov 19 '14 at 15:41
  • @sword no, because `BinarySearch` only returns one index, and you'd have to sort the array for each prefix, which would take longer than a linear search. – D Stanley Nov 19 '14 at 15:49
  • @DStanley, and what if I got all prefixes and all words that start by one letter? So I have no need to sort array – sword Nov 19 '14 at 15:53
  • You could implement your own BinarySearch to find the first item that matches your prefix and then scan while the prefix still matches. – D Stanley Nov 19 '14 at 15:55
  • @DStanley, So it is impossible to get first suitable index and then set start search index for BinarySearch? – sword Nov 19 '14 at 15:59
  • 1
    `BinarySearch` does not help you because it returns _one_ index (and not necessarily the _first_ index). – D Stanley Nov 19 '14 at 16:04
  • @DStanley, I see. Thank you. However I still don't know how to solve my task) – sword Nov 19 '14 at 16:10
1

It's easy to create your own StartsWithComparer:

class StartsWithComparer : IComparer<string>
{
    public int Compare(string a, string b) {
        if(a.StartsWith(b)) {
            return 0;
        }
        return a.CompareTo(b);
    }
}

As others pointed out, this will only return one index. You can have a couple of helpers to return all items:

IEnumerable<string> GetBefore(IList<string> sorted, int foundIndex, string prefix) {
    for(var i = foundIndex - 1; i >= 0; i--) {
        if(sorted[i].StartsWith(prefix)) {
            yield return sorted[i];
        }
    }
}

IEnumerable<string> GetCurrentAndAfter(IList<string> sorted, int foundIndex, string prefix) {
    for(var i = foundIndex; i < sorted.Count; i++) {
        if(sorted[i].StartsWith(prefix)) {
            yield return sorted[i];
        }
    }
}

Then to use it:

var index = sorted.BinarySearch("asdf", new StartsWithComparer());
var previous = GetBefore(sorted, index, "asdf");
var currentAndAfter = GetCurrentAndAfter(sorted, index, "asdf");

You can wrap the whole thing in your own class, with a single method that returns all items that start with your prefix.

FarmerBob
  • 1,314
  • 8
  • 11