5

using bsearch() in C (standard library) one can quickly find an entry in a sorted array.

However, how do I calculate where to insert a new entry (using the standard library)?

bsearch() specifically checks whether the found item's key is equal to the passed key and if it isn't, it returns NULL - so can't use that.

alk
  • 69,737
  • 10
  • 105
  • 255
  • where do you want to insert a new entry into your array? are you actually trying to insert a new entry (i.e. make the array one entry bigger) or are you overwriting an existing entry? – bph Jun 28 '12 at 12:59
  • Yes, I want to make the array bigger and scroll the existing elements to the right of the insertion point to the right, then put the new element at the insertion point. I can feel insertion sort grinning at me :-) However, I'm tyring to use more of the existing functions instead of implementing it myself. – Danny Milosavljevic Jun 28 '12 at 13:03
  • 1
    ok - this sort of thing is a bit of a pain in C unlike other langs where dynamic arrays are built in. You could use a 3rd party dynamic array implementation e.g. http://uthash.sourceforge.net/utarray.html. This allows you to do stuff like append, pop, sort search etc. a bit like other languages e.g. python lists – bph Jun 28 '12 at 13:05
  • 1
    if you want to do insertion yourself you can use std library functions like memcpy, memove to manipulate your arrays (think of them as blocks of contiguous memory) allowing you to insert new data – bph Jun 28 '12 at 13:10
  • Hiett: Thanks :-) I use memmove, just need to find out where to start copying. uthash looks promising. – Danny Milosavljevic Jun 28 '12 at 13:20
  • 1
    I have used it successfully, I liked it as it is self contained within its own header file and not an additional library that needs linking to. Coming from python I found C arrays very limiting and didn't want to reinvent the dynamic array 'wheel' so this fitted the bill for me – bph Jun 28 '12 at 13:22

5 Answers5

5

Its not clear from the question but possibly this is what you want:

You can do something like this to find the index in the array where bsearch () found the match.

if (bsearch_returned_address != NULL)
  index = (bsearch_returned_address - array_base_address)

EDIT

To know the location which the bsort last visited, check the below stuff out.

The good thing is that the manual says:

The compar routine is expected to have two arguments which point to the key object and to an array member, in that order, and should return an integer less than, equal to, or greater than zero if the key object is found, respectively, to be less than, to match, or be greater than the array member.

Therefore you can store the second argument inside the comparison function in a global variable, and in a case of the fail use the address in this variable, which points to the last location the bsearch function visited to find for a match.

For example:

A list with address and value:

[0x8d6010: 0][0x8d6014: 4][0x8d6018: 8][0x8d601c: 12][0x8d6020: 16][0x8d6024: 20][0x8d6028: 24][0x8d602c: 28][0x8d6030: 32][0x8d6034: 36]

Value to search

13

output

fmem: (nil)  //this is the memory location where it was found
last_mem1: 0x7fff8c8e6c54 //last val of 1st param of compare
last_mem2: 0x8d601c //last val of 2nd param of compare
*last_mem1: 13,        *last_mem2: 12

The sample compare function code is

static const int *last_mem1, *last_mem2;

static int
compmi(const void *a, const void *b)
{
    last_mem1 = a; last_mem2 = b;
    return *(int *)a - *(int *)b;
}

So you can insert after the address in last_mem2. Although there are terminal cases, if you find a key which is less than the first element, then last_mem2 will also have the address of the first element.

But how ever you have to shift the array elements to make place for the insertion, which will make the insertion complexity to O(n). You might want to improve performance by introducing some kind of lazy insertion, like make a separate unordered list, which is much smaller than your original list, and dump new elements there. When searching, perform bsearch in original list, and linear search in the dump. When the dump list grows past a certain threshold, you can merge the list by performing an insertion sort. But still, you can't be O(lg n).

phoxis
  • 60,131
  • 14
  • 81
  • 117
  • Yes, but that only works if the item is already in the collection. I want to put a new item into the collection (at the right place). – Danny Milosavljevic Jun 28 '12 at 13:02
  • 1
    You can consider static global and isolate the function and relatives to a file. – phoxis Jun 28 '12 at 13:49
  • Hey, that's very useful! While global variables aren't my cup of tea usually, in this case it seems ok since I need to lock the array anyway while scrolling it. Thanks! – Danny Milosavljevic Jun 28 '12 at 13:55
  • 1
    This works except you must also store the comparison result. If the comparison result is > 0, then you must increment the index by one. – Dustin Jan 27 '20 at 19:50
5

Here's an improvement upon @phoxis's answer that will make the code thread-safe and reentrant by avoiding any global variables. The trick is to use the key itself to store the last visited address.

bsearch_insertion.h

#include <stdlib.h>

#ifndef BSEARCH_INSERTION_H
#define BSEARCH_INSERTION_H

/* Just like bsearch(3), but return a pointer to the element after which
 * the key would need to be inserted in order to maintain sorted order. */
void *bsearch_insertion(
    const void *key, const void *base, size_t nel,
    size_t width, int (*compar)(const void *, const void *));

#endif /* BSEARCH_INSERTION_H */

bsearch_insertion.c

#include "bsearch_insertion.h"

typedef struct
{
    const void *key;
    int (*const compar)(const void *, const void *);
    void *last_visited;
} bsearch_insertion_state;

static int bsearch_insertion_compare(const void *a, const void *b)
{
    bsearch_insertion_state *state = (bsearch_insertion_state *) a;
    state->last_visited = (void *) b;
    return state->compar(state->key, b);
}

void *bsearch_insertion(
    const void *key, const void *base, size_t nel,
    size_t width, int (*compar)(const void *, const void *))
{
    bsearch_insertion_state state = {key, compar, NULL};
    bsearch(&state, base, nel, width, bsearch_insertion_compare);
    return state.last_visited;
}

Example: test.c

#include <stdio.h>
#include "bsearch_insertion.h"

static int cmp(const void *a, const void *b)
{
    int aint = *(const int *)a;
    int bint = *(const int *)b;
    return aint - bint;
}

int main(int argc, char **argv)
{
    int data[] = {0, 1, 2, 3, 5};
    int key = 4;
    void *result = bsearch_insertion(
        &key, data, sizeof(data) / sizeof(data[0]), sizeof(data[0]), cmp);
    /* Should print "Insertion point: 3" */
    printf("Insertion point: %d\n", (int *)result - data);
    return 0;
}
Leo
  • 1,135
  • 11
  • 14
  • What if the key is less than everything in the array, so the insertion point is prior to the first element? I haven't tested the code, but I think the first element would be the last comparison made, and would therefore be returned as the insertion point. This case seems indistinguishable from an insertion point between the first two elements in the array. – Brent Baccala Apr 18 '21 at 19:58
3

Not sure what you mean by "calculate insertion place"; you build an array, then sort it using qsort(), then do (many) searches using bsearch().

In other words: for typical usage, you don't need to implement the array-sorting, since the standard library contains functionality for that, too.

Not sure about the connection to bisecting, here.

UPDATE: From the comment, it seems you're concerned about doing inserts into an array that you're also doing searches from. I would recommend looking at some other data structure that is more friendly towards inserts, such as a hash table for instance. By not relying on sorting to keep searches fast, a hash table might perform better. Inserting into an array involves moving all the subsequent elements, which is also quite costly and which is not needed for e.g. a hash table.

UPDATE 2: To actually try to answer your question, assuming you have a bsearch()-comparible comparator() function for your array of n entries, the index for the new item ni should be given by something like this:

size_t i;

for( i = 0; i < n && comparator(&ni, array + i) >= 0; ++i )
  ;
/* Grow array, copy i..n to (i+1)..(n+1), insert ni at i. */
unwind
  • 391,730
  • 64
  • 469
  • 606
  • 1
    What if the collection I want to search in grows over time and I want to keep it sorted? qsort() seems to be a little wasteful then, especially since some qsort implementations take worst-case time O(n²) when the array is already sorted... – Danny Milosavljevic Jun 28 '12 at 13:00
  • 1
    @unwind - wise words, sounds like a hash table would be your friend in this instance, note from my comment above, there is also a hash table implementation - http://uthash.sourceforge.net/ – bph Jun 28 '12 at 13:14
  • About update 2, well yes, but that's quite slow, why not do the same as bsearch() does, just not check for equality at the end? Then it should end up at the right place, and take O(log n) time to do so... – Danny Milosavljevic Jun 28 '12 at 13:16
1

Because insert causes copying of tail of array, time is O(n). So simple linear search won't slow down your code dramatically. You can even copy items during searching, if you start searching from the end of array.

SKi
  • 8,007
  • 2
  • 26
  • 57
0

Further to @Leo's excellent answer: there is one final wrinkle which you'll run into if you try to make this work. I just did:-)

You need to (as @phoxis suggested) also store the result of the last comparison, because if the result of the last comparison between the new key and the array element that was last compared was > 0, the insertion point is one element further on than the array element that was last compared.

So to fix this, add "int last_cmp" to the bsearch_insertion_state structure, sets it inside the comparison function:

int cmp = state->compar(state->key, b);
state->last_cmp = cmp;
return cmp;

and then (this may not be the most elegant method) alter the return type of bsearch_insertion() from void * to bsearch_insertion_state (i.e. return the whole structure), so that the caller may then write, in the test program:

bsearch_insertion_state state = bsearch_insertion(
    &key, data, sizeof(data) / sizeof(data[0]), sizeof(data[0]), cmp);
int *ipptr = state.last_visited;
if( state.last_cmp > 0 ) ipptr++;
int ip = ipptr - data;

// Should print "Insertion point: 4"
printf("Insertion point (insert key at position): %d\n", ip );

[Note: there may be a subtle difference in my definition of the insertion point compared to @Leo's, hence @Leo's version printed 3, mine prints 4 - but according to my understanding, the insertion point is the position at which the new key is inserted, i.e. first you shuffle up elements ip..endofarray up one, and then you store the key via data[ip] = key. For this particular test case, 4 is the correct insertion point.]

I have built my own version of this, with an array of char * strings, and strcmp() for the low-level comparisons, so my "ipptr" was a char **, and written the outer shuffle up and insert code, and I can verify that this appears to work for me, producing correctly sorted strings.