0

I am implementing a binary search. I seem to have solved my problem but I'd like to understand what was going on.

Here is what I was originally doing for my search:

void find_index(double *list, int val, int low, int high, int *target)
{

  int mid;
  
  if (list[low] == val)
  {
    target[0] = low;
    return;
  } 

  if (list[high] == val)
  {
    target[0] = high;
    return;
  }
  
  mid = (int)((double)(low + high) / 2.0);
  if (list[mid] == val || mid == low || mid == high)
  { 
    target[0] = mid;
    return;
  }

  if (list[low] < val && val < list[mid])
  {
    high = mid;
    find_index(list, val, low, high, target);
  }

  if (list[mid] < val && val < list[high])
  {
    low = mid;
    find_index(list, val, low, high, target);
  }


  return;

}

In the code above, list is a sorted array of doubles. I allocated it dynamically in the main,

double *list = malloc((N+1)*sizeof(double));

and val is the value to be searched (guaranteed to be in list). The main calls func1, which calls find_index. Here's the story:

  1. Initially I allocated statically a size1 array, int target[1] into func1 and passed it to find_index. I got a segmentation fault. Tracking the origin of the error with valgrind I found that, in func1, I was using an uninitialized value AND THAT THE UNINITIALIZED VALUE WAS CREATED IN func1. target is the only array I allocate in such function so it had to be it.
  2. I changed the declaration of find_index. Specifically, int *target became int target[1] as I seem to understand from this question that this is the proper way of passing statically allocated arrays (despite it deals with 2D arrays). I got segmentation fault again and the same message from valgrind.
  3. I declared target dynamically in the main int *target = malloc(1*sizeof(int));, passed it to func1 and then to find_index. The way I passed it to both functions is the same as how I was originally doing, i.e., both functions takes as argument a int *target. It worked, no complain from valgrind.

However I am suspicious now so I decided to change the type of find_index to int and declared target as a scalar into find_index itself. To my understanding of recursions, it should be safe.

However, I do not understand what was going on during steps 1-3 and I'd like to, so to prevent future mistakes. My only guess is that, during the numerous recursive calls that find_index implements, it loses track of the address of the array I originally allocated. Is this something that can happen? Under what circumstances? Could it be related to how I pass the array to the functions and/or to where I originally declared?

I have searched but could not find a truly related question.

EDIT: minimal reproducible example shows the situation is more involved

The minimal reproducible example I could build follows:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <math.h>
#include <time.h>

void func1(double **list, double **another_list, int N);
void func2(double **list, double **another_list, int N);
void find_index(double *list, int val, int low, int high, int *target);
void q_sort_with_doublelst(double *ary,double *lst,int low,int high);
int q_partition_with_doublelst(double *ary,double *lst,int low,int high);
void swap_with_doublelst(double *ary,double *lst,int i,int j);

////////////////////////////////////////////////////////////////////////

// quick sort 

void q_sort_with_doublelst(double *ary,double *lst,int low,int high)
{
    int pivotloc;

    if(low<high)
    {
        pivotloc=q_partition_with_doublelst(ary,lst,low,high);
        q_sort_with_doublelst(ary,lst,low,pivotloc-1);
        q_sort_with_doublelst(ary,lst,pivotloc+1,high);
    }
}

int q_partition_with_doublelst(double *ary,double *lst,int low,int high)
{
    int i,pivotloc;
    int pivotkey;

    swap_with_doublelst(ary,lst,low,(low+high)/2);
    pivotkey=ary[low];
    pivotloc=low;

    for(i=low+1;i<=high;i++)
    {
        if(ary[i]<pivotkey) swap_with_doublelst(ary,lst,++pivotloc,i);
    }
    swap_with_doublelst(ary,lst,low,pivotloc);
    return pivotloc;
}

void swap_with_doublelst(double *ary,double *lst,int i,int j)
{
    int tmp_ary;
    double tmp_lst;

    tmp_ary=ary[i]; tmp_lst=lst[i];
    ary[i]=ary[j]; lst[i]=lst[j];
    ary[j]=tmp_ary; lst[j]=tmp_lst;
}


////////////////////////////////////////////////////////////////////////


void func2(double **list, double **another_list, int N)
{

  int i, j;
  double x;

  for(i=0; i<=N; i++)
  {
    j = (int) lrand48()%(N+1);
    x = lrand48();
    list[0][i] = j;
    list[1][i] = x;

    j = (int) lrand48()%(N+1);
    x = lrand48();
    another_list[0][i] = j;
    another_list[1][i] = x;

  }




}

void func1(double **list, double **another_list, int N)
{

  int i, j;
  int binary_index[1];

  double x, y;

  for(i=0; i<=N; i++)
  {

    find_index(list[0], i, 0, N, binary_index); 

    j = binary_index[0];

    x = another_list[0][j];
    y = another_list[1][j];

  }
  return;
}





void find_index(double *list, int val, int low, int high, int *target)
{

  int mid;
  
  if (list[low] == val)
  {
    target[0] = low;
    return;
  } 

  if (list[high] == val)
  {
    target[0] = high;
    return;
  }
  
  mid = (int)((double)(low + high) / 2.0);
  if (list[mid] == val || mid == low || mid == high)
  { 
    target[0] = mid;
    return;
  }

  if (list[low] < val && val < list[mid])
  {
    high = mid;
    find_index(list, val, low, high, target);
  }

  if (list[mid] < val && val < list[high])
  {
    low = mid;
    find_index(list, val, low, high, target);
  }

  return;

}


int main (int argc, char **argv)
{

  int i, N;
  int seed=time(0);
  srand48(seed);

  N = 1000000;


  double **list = (double **)malloc(2*sizeof(double *)); 
  list[0] = (double *)malloc((N+1)*sizeof(double));                       // contains only ints
  list[1] = (double *)malloc((N+1)*sizeof(double));                       // contains double

  double **another_list = (double **)malloc(2*sizeof(double *)); 
  another_list[0] = (double *)malloc((N+1)*sizeof(double));               // contains only ints
  another_list[1] = (double *)malloc((N+1)*sizeof(double));               // contains double

  for(i=0; i<=N; i++) list[0][i] = i;

  while(1 < 2)
  {
    func2(list, another_list, N);
    q_sort_with_doublelst(list[0], list[1], 0, N);
    func1(list, another_list, N);
  }

  free(list[0]);
  free(list[1]);
  free(list);

  free(another_list[0]);
  free(another_list[1]);
  free(another_list);

  return 0;

}

I tried to reproduce the error by deterministically filling list[0] with the for loop in the main (line 173), so to have it already sorted. It turns out it does not crash. The code I am posting is more similar to my true code, but now I am afraid (as comments suggest) that I have a problem with recursions. A summary of what the minimal example does (very similarly to my code):

  1. Fill list[0] with random integers in [0, N]. Fill list1 with random double.
  2. Fill another_list in the same way.
  3. Sort list[0] by preserving the relation with list[1].
  4. Binary-search list[0] and use the index found to take values from list[1] and another_list.

There is a difference between this example and my true code: here list[0] is not guaranteed to contain the value I am searching. Further, here list[0] may contain duplicates. In principle the search is written with the idea that the value searched needs not to be present and the idea that there are no duplicates. Thus, I would say that if the value is not present, the search should give me the closest value found, but I have no clue to what happens if there are duplicates. I guess it should give me the index of one the possible duplicates, but I am definitely unsure. In my original code there are no duplicates.

I am afraid the present code crashes because of the presence of duplicates. Still, it would be great if someone can identify a different problem. Let me stress that valgrind gives me the same exact message: the error should with target, as it is the only array allocated in func1.

GRquanti
  • 527
  • 8
  • 23
  • Is there a reason `list` is of type `double` whereas `val` is `int`? – Zois Tasoulas Feb 04 '21 at 05:32
  • Post something runnable, that reproduces the error when you run it. – user2357112 Feb 04 '21 at 05:34
  • Yes. ````List```` needs to be of type double because it's part of a 2d dynamic array ````(double **)true_list = malloc(2*sizeof(double*))````, ````true_list[0, 1] = malloc((N+1)*sizeof(double))````. I pass to ````find_nearest```` only ````true_list[0]```` and it contains only integers (encoded as ````double```` of course), but ````true_list[1]```` needs double. Good observation toh. Could it be related? – GRquanti Feb 04 '21 at 05:35
  • @user2357112supportsMonica As I solved my problem by myself, I thought it was not necessary. Going to (try to) build the example right now. – GRquanti Feb 04 '21 at 05:37
  • Why not have the function return the index value instead of returning `void` and modifying `target[0]`? It's a somewhat odd way of writing the function. Many people would write `*target` rather than `target[0]` since it is a single integer, more than an array. – Jonathan Leffler Feb 04 '21 at 05:39
  • We need to see, at the very last, how `target` gets its value in the code that doesn't work. – David Schwartz Feb 04 '21 at 05:39
  • @JonathanLeffler it is what I am doing in the end. I originally did it in this way because I may need to generalize and return both the index of ````list```` and its value. The plan was to generalize to ````int target[2]````. – GRquanti Feb 04 '21 at 05:41
  • 1
    Since the values in the list are of type `double`, it isn't clear that using `int target[2]` will work very well. However, you have a reason — that's 90% of the battle. It isn't formally wrong; it is just unusual. You might also note that comparing `double` values for equality is fraught. You might need to use a "within some tolerance". Again, that can probably wait for another time. How do you report the absence of a match in the array? It isn't clear that you handle that case. And with arrays of `doubles` and no tolerance, it is quite likely that there'll be no match in the array. – Jonathan Leffler Feb 04 '21 at 05:42
  • @JonathanLeffler good point about comparing doubles. The absence of a match is impossible (in theory). – GRquanti Feb 04 '21 at 05:47
  • 1
    You'll need good testing backed up by active assertions to fully justify the "absence of a match is impossible". Your line `mid = (int)((double)(low + high) / 2.0);` is a bit suspect. You might want to read [Extra, Extra — Read All About It: Nearly All Binary Searches and Mergesorts are Broken](https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html) sometime, though it's unlikely your arrays are big enough to suffer from the problem described there. However, you should be able to do pure integer arithmetic in the expression. There is no good reason to use floating point. – Jonathan Leffler Feb 04 '21 at 05:56
  • 1
    You should probably be using `low = mid + 1;` and `high = mid - 1;` to ensure that the `mid` value is excluded from further searches. Otherwise, you are not guaranteed that your search will terminate. I see you are encoding integers in your double array. That's an odd thing to be doing. Why not use an array of integers? Then you have less problem with the extended target array for index and value. – Jonathan Leffler Feb 04 '21 at 06:02
  • @JonathanLeffler ````List```` needs to be of type double because it's part of a 2d dynamic array ````(double **)true_list = malloc(2*sizeof(double*))````, ````true_list[0, 1] = malloc((N+1)*sizeof(double))````. I pass to ````find_nearest```` only ````true_list[0]```` and it contains only integers (encoded as double of course), but ````true_list[1]```` needs double. – GRquanti Feb 04 '21 at 06:05
  • `true_list[0, 1]` is the same as `true_list[1]`. It's not a 2-d index. – Barmar Feb 04 '21 at 06:21
  • If you really assigned `true_list[0, 1]`, then `true_list[0]` is uninitialized. – Barmar Feb 04 '21 at 06:22
  • `int target[1]` and `int *target` are equivalent in a function parameter list. – Barmar Feb 04 '21 at 06:24
  • @Barmar I know. I used the "short notation" to shorten the comment. I do ````true_list[0] = malloc((N+1)*sizeof(double)); true_list[1] = malloc((N+1)*sizeof(double));```` – GRquanti Feb 04 '21 at 06:25
  • And you should be able to pass either a statically declared array or dynamically allocated pointer to it with either declaration. – Barmar Feb 04 '21 at 06:25
  • I think we need to see the whole [mcve] to figure out what you were doing wrong. – Barmar Feb 04 '21 at 06:26
  • @Barmar The third comment, I admit, I am more confused about. I am used to pass ````type *name```` but I have seen in the question linked that they do differently. Good to know it's the same. – GRquanti Feb 04 '21 at 06:27
  • @Barmar working out to build it. Truth is, it turn out the situation is more involved (I need to add other parts of my code to the minimal reproducible example for it to crash; the simple situation I just described does not seem to crash). – GRquanti Feb 04 '21 at 06:29
  • It's different for multidimensional arrays, but you're passing a 1-dimensional array. – Barmar Feb 04 '21 at 06:29
  • Are you aware of structures? You might be better off using a structure to represent the data — but we really need to see an MCVE ([Minimal, Complete, Verifiable Example](https://stackoverflow.com/help/mcve) — or MRE or whatever name SO now uses) or an SSCCE ([Short, Self-Contained, Correct Example](http://sscce.org/)) — the same idea by a different name. – Jonathan Leffler Feb 04 '21 at 06:41
  • @JonathanLeffler yes, I am aware of structures. Here the story becomes kinda ridiculous: my advisor does not use ````struct```` so I am basically forbidden to use them, if I ever want to show him some code and possibly get some help. MCVE posted, but now I am scared. – GRquanti Feb 04 '21 at 06:54
  • I finally got around to poking at the code (sorry it took so long) and I'm puzzled about the code in `func1()` in the full MCVE. When I compile with my default options (warnings treated as errors), the variables `x` and `y` are identified as "set but not used", which is accurate. So I commented the assignments to them (which do nothing and are not used, hence the complaint) and the declarations, but then variable `j` and argument `another_list` are also unused, so I'm left wondering "OK; what is this function meant to be doing?" and I confess I can't guess. Should it be printing something? – Jonathan Leffler Feb 16 '21 at 23:33
  • Incidentally, if C is a focus of your course and the instructor won't use/teach structures, then you've got a lot to leery about. If C is incidental to your course, it may not matter so much, but I'd still be concerned that your instructor won't let you use one of the most important parts of C — structures. – Jonathan Leffler Feb 16 '21 at 23:36
  • In `main()`, you have a loop controlled by `while (1 < 2)`; that condition is always true, so your loop never stops. Is that intentional? If so, use the conventional `while (1)` (or perhaps `for (;;)`) infinite loop. If not intentional, that may be a factor somewhere along the line. Maybe you intended to use `for (int j = 0; j < 2; j++)` to iterate the loop twice. – Jonathan Leffler Feb 16 '21 at 23:47
  • @JonathanLeffler Thanks! Your considerations are basically correct: `func1` in the MCVE is simply where the error takes place, but it does nothing in it. I was indeed puzzled on the reason why I needed it for the MCVE to truly crash. `while(1<2)` was to make the code run until the error, `while(1)` would have been better. – GRquanti Feb 17 '21 at 06:31
  • However, I'm still kinda puzzled. I have temporary left the project aside and I'm thinking to it. I currently think that the true problem is the binary search I wrote: "does my `list[0]` truly contains the numbers that I ask to search?" "What happens if they are not in there?". I'll investigate these questions directly when I'll be back on this project for good. I'm doing these hypotheses as here's the difference between real code and MCVE, but I need my brain to think to something else for a while. – GRquanti Feb 17 '21 at 06:35
  • Confimed :( The issue was that the binary search work only if the searched value only is in the array and it is not true that it always happens in my code. It is correct that it is not guaranteed, I just did not realize it. This also explains why I needed random numbers in the MCVE: this way it happens that targets are not in the list to be searched. Sorry waisting your time. Should I edit the question? – GRquanti Feb 23 '21 at 03:39

0 Answers0