0

I have memory allocated using malloc for a hashtable implementation. The code works well when testing it, but breaks when I use valgrind to profile memory.

I have printed the malloc'd memory locations to see where the offending memory is. During normal testing, the memory allocated is similar to the memory being searched. However, when running valgrind, there is a memory location that is not from the expected list.

The hashtable structure is shown below. Since I am implementing a chained hashtable. The implementation is such that the hashtable buckets are made up of linkedlist pointers and the linkedlist node data contains a pointer to a hashtable_entry which contains the key and value pair.

typedef struct HTBL_ENTRY
{
    void* key;
    void* data;
    int32_t (*key_compare)(const void *a, const void *b);
} hashtable_entry;

typedef struct HASHTABLE_
{
    size_t buckets;            //! Number of buckets in the table
    size_t size;               //! The size of table
    enum HTYPES ktype;               //! The type of the data of the keys
    enum HTYPES dtype;               //! The type of the data of the values
    linkedlist** table;           //! An array list for storing linked list pointers.
    set_t* keys;                //! A set of all the keys in the table.
    int32_t collisions;         //! For diagnostic information on the collision
    int32_t (*key_compare)(const void *a, const void *b);
    void (*key_deallocate)(void *key);
    int32_t (*key_hash)(void* key, size_t buckets);
} hashtable;

The linkedlist structure and the list node are:

typedef struct NODE
{
    struct NODE *next;
    struct NODE *prev;
    void *data;
} node;

typedef struct LINKEDLIST
{
    size_t step;
    int32_t size;
    node* head;
    node* tail;
} linkedlist;

The function that allocates the memory. I use the print statements just for observing the memory locations being accessed:

hashtable_entry* hashtable_set_entry(hashtable* htbl, hashtable_entry *entry, void* key, void* data)
{
    size_t ksize = get_htype_size(htbl->ktype);
    size_t dsize = get_htype_size(htbl->dtype);
    if(entry == NULL)
    {
        entry = (hashtable_entry *)malloc(sizeof(hashtable_entry));
        entry->data = malloc(dsize);
        entry->key = malloc(ksize);

        printf("hashtable_set_entry: entry is %ld\n", entry);
        printf("hashtable_set_entry: entry->key is %ld\n", entry->key);
    }

    entry->key_compare = htbl->key_compare;

    memcpy(entry->key, key, ksize);
    memcpy(entry->data, data, dsize);

    return entry;
}

The function that looks up the the values:

hashtable_entry* hashtable_lookup_entry(hashtable* htbl, void* key)
{
    int32_t bucket = htbl->key_hash(key, htbl->buckets);
    linkedlist* list = htbl->table[bucket];    
    hashtable_entry* entry = NULL;

    node* nd = list->head;
    int32_t i = 0;
    while(nd)
    {
        entry = *(hashtable_entry **)nd->data; 

        printf("hashtable_lookup_entry: entry is %ld\n", entry);
        printf("hashtable_lookup_entry: entry->key is %ld\n", entry->key);

        if(htbl->key_compare(entry->key, key) == 0)
        {
            break;
        }

        nd = nd->next;
        i++;
    }

    // Check that we actually found the data
    return nd ? *(hashtable_entry **)nd->data : NULL;
}

The memory location hashtable_lookup_entry: entry is 4539628424584864688 should not be in the list. And since it is not there when not running memory profiling I cannot figure out why it shows up with valgrind.

hashtable_lookup_entry: entry->key is 210329312
hashtable_lookup_entry: entry is 210347488
hashtable_lookup_entry: entry->key is 210347680
hashtable_lookup_entry: entry is 4539628424584864688
==4142== Invalid read of size 8
==4142==    at 0x485598C: hashtable_lookup_entry (hashtable.c:242)
==4142==    by 0x4855C67: hashtable_insert (hashtable.c:194)
==4142==    by 0x48509A1: hdbscan_compute_hierarchy_and_cluster_tree (hdbscan.c:600)
==4142==    by 0x48512A8: hdbscan_do_run (hdbscan.c:255)
==4142==    by 0x402356: main (tester.c:102)
==4142==  Address 0x3f0000000ba5a3b0 is not stack'd, malloc'd or (recently) free'd
==4142== 
==4142== 
==4142== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==4142==  General Protection Fault
==4142==    at 0x485598C: hashtable_lookup_entry (hashtable.c:242)
==4142==    by 0x4855C67: hashtable_insert (hashtable.c:194)
==4142==    by 0x48509A1: hdbscan_compute_hierarchy_and_cluster_tree (hdbscan.c:600)
==4142==    by 0x48512A8: hdbscan_do_run (hdbscan.c:255)
==4142==    by 0x402356: main (tester.c:102)
==4142== 
==4142== HEAP SUMMARY:
==4142==     in use at exit: 48,869,452 bytes in 8,103 blocks
==4142==   total heap usage: 11,587 allocs, 3,484 frees, 138,983,752 bytes allocated
==4142== 
==4142== 2,128 bytes in 7 blocks are possibly lost in loss record 33 of 53
==4142==    at 0x483AB1A: calloc (vg_replace_malloc.c:762)
==4142==    by 0x4012391: allocate_dtv (dl-tls.c:286)
==4142==    by 0x4012D01: _dl_allocate_tls (dl-tls.c:532)
==4142==    by 0x4A4C21E: allocate_stack (allocatestack.c:621)
==4142==    by 0x4A4C21E: pthread_create@@GLIBC_2.2.5 (pthread_create.c:669)
==4142==    by 0x4A24FA5: gomp_team_start (team.c:836)
==4142==    by 0x4A1B4B0: GOMP_parallel (parallel.c:169)
==4142==    by 0x484E78B: distance_compute (distance.c:293)
==4142==    by 0x48513B4: hdbscan_run (hdbscan.c:311)
==4142==    by 0x402356: main (tester.c:102)
==4142== 
==4142== 53,952 (160 direct, 53,792 indirect) bytes in 4 blocks are definitely lost in loss record 48 of 53
==4142==    at 0x483880B: malloc (vg_replace_malloc.c:309)
==4142==    by 0x4855BD9: hashtable_set_entry (hashtable.c:418)
==4142==    by 0x4855CE7: hashtable_insert (hashtable.c:198)
==4142==    by 0x48509A1: hdbscan_compute_hierarchy_and_cluster_tree (hdbscan.c:600)
==4142==    by 0x48512A8: hdbscan_do_run (hdbscan.c:255)
==4142==    by 0x402356: main (tester.c:102)
==4142== 
==4142== LEAK SUMMARY:
==4142==    definitely lost: 160 bytes in 4 blocks
==4142==    indirectly lost: 53,792 bytes in 16 blocks
==4142==      possibly lost: 2,128 bytes in 7 blocks
==4142==    still reachable: 48,813,372 bytes in 8,076 blocks
==4142==         suppressed: 0 bytes in 0 blocks
==4142== Reachable blocks (those to which a pointer was found) are not shown.
==4142== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==4142== 
==4142== For lists of detected and suppressed errors, rerun with: -s
==4142== ERROR SUMMARY: 252 errors from 10 contexts (suppressed: 0 from 0)
Segmentation fault (core dumped)
  • 1
    The line `entry = *(hashtable_entry **)nd->data;` looks really suspicious what is the type of `node`? – Inian Aug 05 '19 at 09:35
  • 2
    Casting that data pointer via multi-indirection `hashtable_entry**`, then immediately dereferencing it is a pungent code smell. Unless you seriously know what you're doing (and usually even then), considering casting in C to be a red flag. That includes casting malloc (i.e. `(hashtable_entry *)malloc(....)`, [which shouldn't be done in C programs](https://stackoverflow.com/questions/605845/do-i-cast-the-result-of-malloc). Please update your question with a [mcve]. As currently presented, there is little we can do to help, and odds are valgrind is correct. – WhozCraig Aug 05 '19 at 09:35
  • My guess is that "entry" doesn't exist. – Owl Aug 05 '19 at 12:16
  • @Inian that line is ok. ```nd->data``` is a pointer to hashtable_entry pointer. I have edited the question to include the necessary structures. – Onalenna Junior Makhura Aug 05 '19 at 18:16
  • @WhozCraig @Owl The reason I am casting and dereferencing is because nd->data is declared a ```void *``` and it contains a pointer to a ```hashtable_entry```. What baffles me is that that outside memory access only happens during valgrind run. – Onalenna Junior Makhura Aug 05 '19 at 18:21
  • `hashtable_set_entry` doesn't actually set anything in the hashtable. – n. m. could be an AI Aug 05 '19 at 19:18
  • @n.m. yes. It does not. That is because it is designed to either create a new ```hashtable_entry``` or change the values. There is another functions that handles the insertion where necessary. – Onalenna Junior Makhura Aug 05 '19 at 20:51
  • So you want us to figure out where is the bug in *that* function? – n. m. could be an AI Aug 06 '19 at 02:49

0 Answers0