I am implementing a binary search. I seem to have solved my problem but I'd like to understand what was going on.
Here is what I was originally doing for my search:
void find_index(double *list, int val, int low, int high, int *target)
{
int mid;
if (list[low] == val)
{
target[0] = low;
return;
}
if (list[high] == val)
{
target[0] = high;
return;
}
mid = (int)((double)(low + high) / 2.0);
if (list[mid] == val || mid == low || mid == high)
{
target[0] = mid;
return;
}
if (list[low] < val && val < list[mid])
{
high = mid;
find_index(list, val, low, high, target);
}
if (list[mid] < val && val < list[high])
{
low = mid;
find_index(list, val, low, high, target);
}
return;
}
In the code above, list is a sorted array of doubles. I allocated it dynamically in the main,
double *list = malloc((N+1)*sizeof(double));
and val
is the value to be searched (guaranteed to be in list
). The main calls func1
, which calls find_index
. Here's the story:
- Initially I allocated statically a size1 array,
int target[1]
intofunc1
and passed it tofind_index
. I got a segmentation fault. Tracking the origin of the error with valgrind I found that, infunc1
, I was using an uninitialized value AND THAT THE UNINITIALIZED VALUE WAS CREATED INfunc1
.target
is the only array I allocate in such function so it had to be it. - I changed the declaration of
find_index
. Specifically,int *target
becameint target[1]
as I seem to understand from this question that this is the proper way of passing statically allocated arrays (despite it deals with 2D arrays). I got segmentation fault again and the same message from valgrind. - I declared
target
dynamically in the mainint *target = malloc(1*sizeof(int));
, passed it tofunc1
and then tofind_index
. The way I passed it to both functions is the same as how I was originally doing, i.e., both functions takes as argument aint *target
. It worked, no complain from valgrind.
However I am suspicious now so I decided to change the type of find_index
to int
and declared target
as a scalar into find_index
itself. To my understanding of recursions, it should be safe.
However, I do not understand what was going on during steps 1-3 and I'd like to, so to prevent future mistakes. My only guess is that, during the numerous recursive calls that find_index
implements, it loses track of the address of the array I originally allocated. Is this something that can happen? Under what circumstances? Could it be related to how I pass the array to the functions and/or to where I originally declared?
I have searched but could not find a truly related question.
EDIT: minimal reproducible example shows the situation is more involved
The minimal reproducible example I could build follows:
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <math.h>
#include <time.h>
void func1(double **list, double **another_list, int N);
void func2(double **list, double **another_list, int N);
void find_index(double *list, int val, int low, int high, int *target);
void q_sort_with_doublelst(double *ary,double *lst,int low,int high);
int q_partition_with_doublelst(double *ary,double *lst,int low,int high);
void swap_with_doublelst(double *ary,double *lst,int i,int j);
////////////////////////////////////////////////////////////////////////
// quick sort
void q_sort_with_doublelst(double *ary,double *lst,int low,int high)
{
int pivotloc;
if(low<high)
{
pivotloc=q_partition_with_doublelst(ary,lst,low,high);
q_sort_with_doublelst(ary,lst,low,pivotloc-1);
q_sort_with_doublelst(ary,lst,pivotloc+1,high);
}
}
int q_partition_with_doublelst(double *ary,double *lst,int low,int high)
{
int i,pivotloc;
int pivotkey;
swap_with_doublelst(ary,lst,low,(low+high)/2);
pivotkey=ary[low];
pivotloc=low;
for(i=low+1;i<=high;i++)
{
if(ary[i]<pivotkey) swap_with_doublelst(ary,lst,++pivotloc,i);
}
swap_with_doublelst(ary,lst,low,pivotloc);
return pivotloc;
}
void swap_with_doublelst(double *ary,double *lst,int i,int j)
{
int tmp_ary;
double tmp_lst;
tmp_ary=ary[i]; tmp_lst=lst[i];
ary[i]=ary[j]; lst[i]=lst[j];
ary[j]=tmp_ary; lst[j]=tmp_lst;
}
////////////////////////////////////////////////////////////////////////
void func2(double **list, double **another_list, int N)
{
int i, j;
double x;
for(i=0; i<=N; i++)
{
j = (int) lrand48()%(N+1);
x = lrand48();
list[0][i] = j;
list[1][i] = x;
j = (int) lrand48()%(N+1);
x = lrand48();
another_list[0][i] = j;
another_list[1][i] = x;
}
}
void func1(double **list, double **another_list, int N)
{
int i, j;
int binary_index[1];
double x, y;
for(i=0; i<=N; i++)
{
find_index(list[0], i, 0, N, binary_index);
j = binary_index[0];
x = another_list[0][j];
y = another_list[1][j];
}
return;
}
void find_index(double *list, int val, int low, int high, int *target)
{
int mid;
if (list[low] == val)
{
target[0] = low;
return;
}
if (list[high] == val)
{
target[0] = high;
return;
}
mid = (int)((double)(low + high) / 2.0);
if (list[mid] == val || mid == low || mid == high)
{
target[0] = mid;
return;
}
if (list[low] < val && val < list[mid])
{
high = mid;
find_index(list, val, low, high, target);
}
if (list[mid] < val && val < list[high])
{
low = mid;
find_index(list, val, low, high, target);
}
return;
}
int main (int argc, char **argv)
{
int i, N;
int seed=time(0);
srand48(seed);
N = 1000000;
double **list = (double **)malloc(2*sizeof(double *));
list[0] = (double *)malloc((N+1)*sizeof(double)); // contains only ints
list[1] = (double *)malloc((N+1)*sizeof(double)); // contains double
double **another_list = (double **)malloc(2*sizeof(double *));
another_list[0] = (double *)malloc((N+1)*sizeof(double)); // contains only ints
another_list[1] = (double *)malloc((N+1)*sizeof(double)); // contains double
for(i=0; i<=N; i++) list[0][i] = i;
while(1 < 2)
{
func2(list, another_list, N);
q_sort_with_doublelst(list[0], list[1], 0, N);
func1(list, another_list, N);
}
free(list[0]);
free(list[1]);
free(list);
free(another_list[0]);
free(another_list[1]);
free(another_list);
return 0;
}
I tried to reproduce the error by deterministically filling list[0]
with the for loop in the main (line 173), so to have it already sorted. It turns out it does not crash. The code I am posting is more similar to my true code, but now I am afraid (as comments suggest) that I have a problem with recursions. A summary of what the minimal example does (very similarly to my code):
- Fill
list[0]
with random integers in [0, N]. Fill list1 with random double. - Fill
another_list
in the same way. - Sort
list[0]
by preserving the relation withlist[1]
. - Binary-search
list[0]
and use the index found to take values fromlist[1]
andanother_list
.
There is a difference between this example and my true code: here list[0]
is not guaranteed to contain the value I am searching. Further, here list[0]
may contain duplicates. In principle the search is written with the idea that the value searched needs not to be present and the idea that there are no duplicates. Thus, I would say that if the value is not present, the search should give me the closest value found, but I have no clue to what happens if there are duplicates. I guess it should give me the index of one the possible duplicates, but I am definitely unsure. In my original code there are no duplicates.
I am afraid the present code crashes because of the presence of duplicates. Still, it would be great if someone can identify a different problem. Let me stress that valgrind gives me the same exact message: the error should with target
, as it is the only array allocated in func1
.