0

I am attempting to find a user input string in a pre-sorted array. If I write my own binary search function, the input is found correctly. If I use the C bsearch, I always get a NULL pointer.

This is the relevant code snippet:

printf(bsearch(&input, *words, curr_idx + 1, max_len,
               (int (*)(const void *, const void *))strcmp) ?
                        "YES" : "NO");

char input[max_len] is the result of scanf("%s", input); uppercase(input);

char **words is a pre-sorted array of uppercase strings

int curr_idx is the max index of words

int max_len is the max length, in bytes, of the words in words (currently 18)

I've tried inputting strings I know are in the array, as well as strings I know are NOT in the array, and every case returns a NULL pointer.

Setting a breakpoint in gdb and examining the contents of input and words, it doesn't appear that anything is incorrect:

(gdb) print (char*)input
$5 = 0x7fffffffe610 "STONE"

(gdb) print words[150980]
$6 = 0x555555bf45a0 "STONE"

EDIT TO ADD MCVE:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

char **words;
char *dictionary[5] = { "STOMPS", "STONABLE", "STONE", "STONEBOAT", "STONEBOATS" };
int curr_idx = 4;
int max_len = 18;

int compare(const void *a, const void *b)
{
    return strcmp((const char *)a, (const char *)b);
}

void uppercase(char *input)
{
    char *t = input;
    while (*t) {
        *t = toupper((unsigned char)*t);
        t++;
    }
}

int main()
{
        words = malloc((curr_idx + 1) * sizeof(char *));
        int i;
        for (i = 0; i < 5; i++){
               // words[i] = malloc(sizeof(char) * max_len);
               words[i] = dictionary[i];
        }

        char input[max_len];

    printf("Enter a word: ");
    scanf("%s", input);
    uppercase(input);
    printf(bsearch(input, words, curr_idx + 1, sizeof(*words), compare) ?
               "YES\n" :
               "NO\n");
}

The malloc() bit is unnecessary, but meant to replicate the original program as closely as possible.

Justin
  • 64
  • 1
  • 12
  • `uppercase(scanf("%s", input))` is very wrong. Why don't you pass `max_len` instead of `sizeof(char*)` there? – KamilCuk Apr 13 '19 at 23:27
  • @KamilCuk edited to clarify `input`. I have tried `sizeof(char*)` as well, with the same result. – Justin Apr 13 '19 at 23:31
  • 1
    Why do you pass `*words`? Try something along `printf(bsearch(input, words, curr_idx, sizeof(*words), words_strcmp) != NULL ? "YES" : "NO");` where `int words_strcmp(const void *a, const void *b) { return strcmp(a, b) != NULL; }`. Don't typecast pointers, and don't typecast function pointers, that's undefined behavior, that will never work (and if it will work, it will stop someday, out of no reason). – KamilCuk Apr 13 '19 at 23:34
  • 2
    Never cast function pointers. Also, post a [mcve]. – melpomene Apr 13 '19 at 23:39
  • I have also tried `printf(bsearch(input, words, curr_idx + 1, sizeof(*words), compare) ? "YES" : "NO");` where `compare` is defined as: `int compare(const void *a, const void *b) { return strcmp((const char *)a, (const char *)b); }` – Justin Apr 13 '19 at 23:39
  • 1
    @KamilCuk your "corrected" `words_strcmp` is wrong, it needs to return -/0/+ as strcmp does; the `!= NULL` is a mistake. – hobbs Apr 13 '19 at 23:40
  • @melpomene Thanks for your comment; I have edited to add a directly-compilable example that produces the same results. – Justin Apr 13 '19 at 23:59
  • 2
    You have UB in your code. `curr_idx + 1 * sizeof(char *)` is `curr_idx + (1 * sizeof(char *))` – KamilCuk Apr 14 '19 at 00:03
  • 1
    [Don't cast malloc()](https://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc). – melpomene Apr 14 '19 at 00:04
  • 1
    `words[i] = (char *)malloc(..); words[i] = ...` is a memory leak. You've just overwritten the pointer returned by `malloc`. – melpomene Apr 14 '19 at 00:05
  • @KamilCuk Thanks for noticing that; I just threw this example together in a rush. Anyway, that isn't the source of the issue. The actual code mallocs correctly, and adding parens to the example still produces the same results. – Justin Apr 14 '19 at 00:05
  • Don't use `scanf` for user input. If you use `scanf`, limit the length of allowed inputs (i.e. don't use `%s`) and always check its return value. – melpomene Apr 14 '19 at 00:08
  • @melpomene Your comments on my code style in this example aren't super helpful, as they don't directly relate to the source of the issue. In the actual code, I'm using fgets() and passing the malloc'd pointer. Can you either focus on the issue, or just pass on this one? – Justin Apr 14 '19 at 00:09
  • It's not a MCVE if it requires user input. You don't even say what the input should be. Preferably you'd just hardcode all inputs to avoid us having to guess. – melpomene Apr 14 '19 at 00:09
  • My comments so far have been on actual bugs in your code. This is not a "code style" issue. Besides, if you don't want comments on code that doesn't directly relate to your issue, don't post code that doesn't relate to your issue. Please focus on the issue at hand (the M in MCVE). – melpomene Apr 14 '19 at 00:11
  • casting malloc doesn't create any issues. Casting a function pointer to a compatible type is "undefined behavior" but doesn't create any issues. You asked for a minimal example, so I created an example that produces the same behavior without copy-pasting the entire source. In this case, user input is necessary. You can decide to input any word. Obviously words not in the list should produce NO, and words in the list should produce YES. – Justin Apr 14 '19 at 00:14

2 Answers2

2

Here's a reduced version of your code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int compare(const void *a, const void *b)
{
    return strcmp(a, b);
}

int main(void)
{
    const char *words[] = { "A" };
    puts(bsearch("A", words, 1, sizeof *words, compare) ?
            "YES" :
            "NO");
}

The problem is that bsearch calls your compare function with a pointer to the current array element (as the second argument, that is; the first argument is always the key pointer given to bsearch as its first argument).

Your array elements are pointers (char *), so compare receives a pointer to pointer to char. To make strcmp work, you need to dereference that pointer:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int compare(const void *a, const void *b)
{
    const char *const *pstr = b;
    return strcmp(a, *pstr);
}

int main(void)
{
    const char *words[] = { "A" };
    puts(bsearch("A", words, 1, sizeof *words, compare) ?
            "YES" :
            "NO");
}
melpomene
  • 84,125
  • 8
  • 85
  • 148
  • Thanks, this one was actually useful. I didn't fully understand the behavior of bsearch in this case. – Justin Apr 14 '19 at 00:26
0

Your comparison function is wrong. You're comparing entries in an array of character pointers, so the comparison function is passed two char ** values disguised as const void *. Your comparison code needs to react accordingly.

int compare(const void *v1, const void *v2)
{
    const char *s1 = *(char **)v1;
    const char *s2 = *(char **)v2;
    // printf("[%s] <=> [%s] = %d\n", s1, s2, strcmp(s1, s2));
    return strcmp(s1, s2);
}

The bsearch function then needs to be called correctly, too:

char *key = input;
bsearch(&key, words, 5, sizeof(*words), compare);

The advantage of this comparison function over some others is that the same comparator can be used with qsort() to order the data in the array, which ensures that the comparisons are … err … comparable. You can devise other ways of driving bsearch() (see melpomene's answer) because bsearch() always passes the key as the first argument to the comparison function. But the ability to use the same comparator for both sorting and searching (rather than requiring two different functions) seems to me to outweigh the benefit of not always dereferencing the key.

Working code:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>

static char **words;
static char *dictionary[] = { "STOMPS", "STONABLE", "STONE", "STONEBOAT", "STONEBOATS" };
static int curr_idx = 4;
static int max_len = 18;

#if 0
int compare(const void *a, const void *b)
{
    return strcmp((const char *)a, (const char *)b);
}

#endif

static int compare(const void *v1, const void *v2)
{
    const char *s1 = *(char **)v1;
    const char *s2 = *(char **)v2;
     printf("%s(): [%s] <=> [%s] = %d\n", __func__, s1, s2, strcmp(s1, s2));
    return strcmp(s1, s2);
}

static void uppercase(char *input)
{
    char *t = input;
    while (*t)
    {
        *t = toupper((unsigned char)*t);
        t++;
    }
}

int main(void)
{
    words = malloc((curr_idx + 1) * sizeof(char *));
    int i;
    for (i = 0; i < curr_idx + 1; i++)
    {
        // words[i] = malloc(sizeof(char) * max_len);
        words[i] = dictionary[i];
    }

    char input[max_len];

    char fmt[10];
    snprintf(fmt, sizeof(fmt), "%%%ds", max_len - 1);
    while (printf("Enter a word: ") > 0 && scanf(fmt, input) == 1)
    {
        uppercase(input);
        char *key = input;
        char **result = bsearch(&key, words, curr_idx + 1, sizeof(*words), compare);
        if (result != 0)
            printf("Key [%s] found at %p [%s]\n", input, result, *result);
        else
            printf("Key [%s] not found\n", input);
    }
    putchar('\n');

    return 0;
}

Sample run (program name bs11):

$ ./bs11
Enter a word: abracadabra
compare(): [ABRACADABRA] <=> [STONE] = -18
compare(): [ABRACADABRA] <=> [STONABLE] = -18
compare(): [ABRACADABRA] <=> [STOMPS] = -18
Key [ABRACADABRA] not found
Enter a word: STOMPS
compare(): [STOMPS] <=> [STONE] = -1
compare(): [STOMPS] <=> [STONABLE] = -1
compare(): [STOMPS] <=> [STOMPS] = 0
Key [STOMPS] found at 0x7fa1e4402a70 [STOMPS]
Enter a word: stona
compare(): [STONA] <=> [STONE] = -4
compare(): [STONA] <=> [STONABLE] = -66
compare(): [STONA] <=> [STOMPS] = 1
Key [STONA] not found
Enter a word: STONABLE
compare(): [STONABLE] <=> [STONE] = -4
compare(): [STONABLE] <=> [STONABLE] = 0
Key [STONABLE] found at 0x7fa1e4402a78 [STONABLE]
Enter a word: stonable
compare(): [STONABLE] <=> [STONE] = -4
compare(): [STONABLE] <=> [STONABLE] = 0
Key [STONABLE] found at 0x7fa1e4402a78 [STONABLE]
Enter a word: stonc
compare(): [STONC] <=> [STONE] = -2
compare(): [STONC] <=> [STONABLE] = 2
Key [STONC] not found
Enter a word: STONE
compare(): [STONE] <=> [STONE] = 0
Key [STONE] found at 0x7fa1e4402a80 [STONE]
Enter a word: stobneage
compare(): [STOBNEAGE] <=> [STONE] = -12
compare(): [STOBNEAGE] <=> [STONABLE] = -12
compare(): [STOBNEAGE] <=> [STOMPS] = -11
Key [STOBNEAGE] not found
Enter a word: STONEBOAT
compare(): [STONEBOAT] <=> [STONE] = 66
compare(): [STONEBOAT] <=> [STONEBOATS] = -83
compare(): [STONEBOAT] <=> [STONEBOAT] = 0
Key [STONEBOAT] found at 0x7fa1e4402a88 [STONEBOAT]
Enter a word: stoneboatatdawn
compare(): [STONEBOATATDAWN] <=> [STONE] = 66
compare(): [STONEBOATATDAWN] <=> [STONEBOATS] = -18
compare(): [STONEBOATATDAWN] <=> [STONEBOAT] = 65
Key [STONEBOATATDAWN] not found
Enter a word: STONEBOATS
compare(): [STONEBOATS] <=> [STONE] = 66
compare(): [STONEBOATS] <=> [STONEBOATS] = 0
Key [STONEBOATS] found at 0x7fa1e4402a90 [STONEBOATS]
Enter a word: zoo
compare(): [ZOO] <=> [STONE] = 7
compare(): [ZOO] <=> [STONEBOATS] = 7
Key [ZOO] not found
Enter a word: ^D
$
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278