-5

The challenge: Given a sorted array of names (only first name) and a name X for which I need to find out how many times it is present in the array. Constraint: Only binary search is allowed.Also I cannot use any string library function. Sample Input:

3
abc
abc
pqr
pqr

output: 1(as pqr occurs just once in the array of strings/words).

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
204
  • 433
  • 1
  • 5
  • 19

3 Answers3

1

1) Compare string without using string library: implement the string comparison function by yourself: compare the first characters of both strings, only if they are equal, compare the next characters, and so on.

2) Sorted array and binary search: Binary search is perfect for a sorted array. Let x be the query string.

Step 1: Use binary search to find x in the sorted array.

Step 2: If cannot found, return 0.

Step 3: If found, binary search for x in the left and right subarray. Record the result indexes.

Repeat step 3 (Note: smaller left, right sub-arrays in each loop), stop the search in left (or right) direction if cannot find x in left (or right) subarray. The previous found indexes of left and right gives the range of results.

cuongptnk
  • 472
  • 3
  • 15
  • Should not have answered. OP did not show any effort of him trying to do it by himself. Don't spoon-feed. Also, the question is badly written: he stated that he knows how to do it with linear search but it's not an option. In your answer, you combine linear and binary searches. We don't know if it's ok to do so. – Fureeish Jan 29 '17 at 16:16
  • About the question: if you want to know the number of equal elements, the minimum number of comparisons is the number of those equal elements. Linear scan the whole array is different from linear scan only the equal elements – cuongptnk Jan 29 '17 at 16:21
  • About the OP: the answer gives the idea, not the code. To get it done, the OP still need to do the work. I have seen a lot of questions in here, where the OP asks something similar and give some codes. Then people jump in and give him the perfect running code. In my opinion, that is even worst; but no one says it is bad. – cuongptnk Jan 29 '17 at 16:24
  • Considering weridness of the task, I would not be surprised if the aim was to binary search X, add 1 to count and remove that string. Repeat untill BS fails to find. Regardless, questions like that should not be answered for various reasons. @JJJ's comment sums it up well – Fureeish Jan 29 '17 at 16:24
  • @Fureeish Right. Using only the BS is better. I update the answer. – cuongptnk Jan 29 '17 at 16:31
  • Depending on what "only binary search is allowed" really means, there are variants of binary search that can easily find both endpoints of a range of equal elements. Even if those are not allowed, the OP can always first search for the last element less than the target, and then search the rest of the array for the first element greater than the target. – Ilmari Karonen Jan 29 '17 at 16:46
  • @IlmariKaronen, how do you know which elements are less than or greater the given query ? – cuongptnk Jan 29 '17 at 16:50
  • Um... using the string comparison code? If you can't tell which items are less than or greater than the target, you can't use binary search anyway. – Ilmari Karonen Jan 29 '17 at 17:00
  • No i mean: you suggest to search for element less than the target. But how to get that element ? Your assumption here is that given the query, you can get the element less than it. How to do that without doing a search ? – cuongptnk Jan 29 '17 at 17:02
  • @IlmariKaronen Also, searching for element less than the target still suffer the equality effect (for that element). In other words, get index of the less element does not guarantee that is the right lower bound. – cuongptnk Jan 29 '17 at 17:04
  • Given any monotone increasing boolean-valued function (such as `x → (x >= target)` or `x → (x > target)`), a binary search can be used to find the point in a sorted array where the function changes from 0 to 1. Do that for both of the functions I just mentioned, and you've found the endpoints of the range of elements equal to the target. – Ilmari Karonen Jan 29 '17 at 17:16
  • @IlmariKaronen so what you are saying is the same as the answer i given ? I thought you want to get the element less than the query, than search that element for the lower bound. – cuongptnk Jan 29 '17 at 17:21
  • @cuongptnk: Please see the answer I just posted. I've tried to avoid providing complete copy-pasteable code, so that I wouldn't do the OP's homework for them, but hopefully both you and the OP can see how to solve the problem using the method I describe in my answer. – Ilmari Karonen Jan 29 '17 at 17:46
1

There are plenty of questions on SO about how to compare strings in C, so I will just direct you to those questions and their answers. Basically, the solution is to compare the strings character by character until you either find a pair of characters that differ, or until you reach the end of both strings. Then compare the characters at the point you've reached, and return the result of that comparison.

As for using binary search to find the number of times a given target string occurs in a sorted array, probably the simplest efficient solution is to do two binary searches: one for the starting point and one for the end point of the range of array elements equal to the target. The distance from the starting point to the end point will then give the length of the range.

The following binary search routine will return the index of the last element in the array haystack that is less than the target needle (or -1, if there is no such element):

int find_last_less_than(int needle, int *haystack, int length) {
    int base = -1, step = length - base;

    // loop invariants:
    // 1. base == -1 || haystack[base] < needle
    // 2. base + step >= length || haystack[base + step] >= needle
    while (step > 1) {
        step = (step + 1) / 2;  // divide interval in half, rounding up
        int index = base + step;
        if (index < length && haystack[index] < needle) base += step;
    }
    return base;
}

Note that for simplicity (and to avoid just spoon-feeding you the entire solution) this code searches an array of integers rather than of strings, but I trust that you can figure out how to modify it as needed.

Hopefully, it should also be clear how to modify the search so that, instead of finding the last element less than the target, it will find the index of the last element less than or equal to the target. Subtracting the former from the latter will then yield the number of times the target occurs in the array.

Community
  • 1
  • 1
Ilmari Karonen
  • 49,047
  • 9
  • 93
  • 153
  • I see `find_last_less_than` as the `variants of binary search` in your comment. Good to know. I always thought of one standard BS. – cuongptnk Jan 29 '17 at 18:00
  • 1
    I would consider this a standard binary search. By variants, I meant extending the code to find both endpoints in a single search. One fairly simple optimization in that vein would be to add an extra variable `int limit = length;` to the function and an extra `else if (index < length && haystack[index] > needle) limit = index;` line inside the loop. That way, after finding the lower bound `base`, you would only need to search the range from `base` up to `limit-1` for the upper bound. – Ilmari Karonen Jan 29 '17 at 18:10
  • I think your function is Exponential search variant as the definition in Wiki, https://en.wikipedia.org/wiki/Binary_search_algorithm#Exponential_search – cuongptnk Jan 29 '17 at 18:18
  • Yeah. Recording partial result when searching the base does speed up finding the limit. – cuongptnk Jan 29 '17 at 18:25
  • No, it's not an exponential search (although it could be made into one, by replacing the `while (2*step <= length)` initialization loop with `while (2*step <= length && haystack[2*step-1] < needle)`). Indeed, the entire initialization loop is just there to round up the length of the array to a power of 2 (in order to simplify the search loop below), and could equally well be replaced e.g. with [bit manipulation hacks](http://stackoverflow.com/a/466278). – Ilmari Karonen Jan 29 '17 at 18:26
  • (Also, I just realized that my original code was initializing `step` inefficiently; I've edited my code to fix that.) – Ilmari Karonen Jan 29 '17 at 18:32
  • A standard BS would compute the middle index from L and R indexes. Then decide to go left or right. Your function increases the base by a series of power of 2 (as long as it is still less than). It does not resemble the signature of a standard BS. In this wiki page, in the code of Algorithm section, they makes a distinction between `exponential_search` and `binary_search`. https://en.wikipedia.org/wiki/Exponential_search – cuongptnk Jan 29 '17 at 18:36
  • Placing the check for needle inside while condition or inside the while scope are the same. The idea behind is still the same. Anyway, it is just verbal problem. In my opinion, exponential and binary search (or log search for arbitrary base) are just 2 ways (from 2 directions) to look at the same thing. – cuongptnk Jan 29 '17 at 18:52
  • I'm not sure you're really following what my code does (or did); the first loop in it exists just to initialize the `step` variable, and does no searching of the array. Anyway, I just rewrote it to eliminate the first loop entirely, and hopefully make it look a bit more like the style of binary search code you may be more familiar with. Hopefully, that should also clarify that, as written, it's *not* an exponential search. – Ilmari Karonen Jan 29 '17 at 19:15
  • Hm. It is up to you then. I talk about the idea in the whole, not about the actually implementation. The reason it is exponential search is because the idea behind it, not because of how the `step` is used. – cuongptnk Jan 29 '17 at 19:19
  • Strictly speaking, it is not exponential search as well. Because in exponential search, double the step and search at the same time. In here, the step has max value, and keep halving. It has "exponential spirit" because the base is increasing by a series of double values. – cuongptnk Jan 29 '17 at 19:24
  • It has "exponential spirit" because the base is increasing by a series of double values (in reverse order in compare with standard exponential search). – cuongptnk Jan 29 '17 at 19:30
  • 1
    I would still say that it's just a basic binary search. If you're more used to seeing binary search written in terms of `left` and `right` variables, then mentally replace `base` in my code with `left`, and `step` with `(right - left)`. – Ilmari Karonen Jan 29 '17 at 19:47
  • 1
    I see it now. Yup, it is standard BS. Thank you. – cuongptnk Jan 29 '17 at 19:59
0

Here is the code I wrote to perform the task:

enter code here
#include<stdio.h>
#include<stdlib.h>
int first(char *a[26],int low,int high,char *word,int n);
int last(char*a[26],int low,int high,char* word,int n);


int mycomp(char* a,char* b){

  int i = 0;
  while (a[i] == b[i] && a[i] != '\0')
   i++;
   if (a[i] > b[i])
      return 2;
   else if (a[i] < b[i])
     return -2;
   else
     return 0;



}

int main(){

int n,count=0;


scanf("%d",&n);
char dictionery[n][50];
char word[50];
  int i,j;
  for(i=0;i<n;i++)
      scanf("%s",dictionery[i]);

  scanf("%s",word);

 i = first(dictionery[n],0,n-1,word,n);

 if(i==-1)
  printf("No occurrances\n");

  else{
   j = last(dictionery[n],i,n-1,word,n);
   count = j-i+1;
   printf("%d\n",count);

  }

 return 0;
}

int first(char *a[26],int low,int high,char *word,int n){

 if(high >= low){
    int mid = (high+low)/2;
    int d = mycomp(word,a[mid-1]);
    int e = mycomp(word,a[mid]);
    if((mid==0 || d>0)&& e==0)
        return mid;

    else if(e>0)
       return first(*a[],mid+1,high,word,n);
    else
       return first(*a[],low,mid-1,word,n);




 }

 return -1;


}

int last(char* a[26],int low,int high,char* word,int n){

if(high>=low){
    int mid = (high+low)/2;
    int d = mycomp(word,a[mid+1]);
    int e = mycomp(word,a[mid]);

    if((mid==n-1 || d<0)&& e==0 )
       return mid;
    else if(e<0)
       return last(*a[],low,mid-1,word,n);
    else 
       return last(*a[],mid+1,high,word,n);

}
return -1;


}

Is this the most efficient way of performing the task?

204
  • 433
  • 1
  • 5
  • 19