-1

I'm trying to create an empty hash table of structure:

struct htab
{
   size_t capacity;
   size_t size;
   struct pair *data;
};

The data is an array of linked-lists to struct pair values. These linked-lists contain sentinels (dummy value) as first element.

struct pair
{
   uint32_t hkey;
   char *key;
   void *value;
   struct pair *next;
};

So I wrote this to have a capacity of 4 and size of 0. How could I initialise all cells of the 'data' array to 0?

struct htab *htab_new()
{
   struct htab *newtable = 
   malloc(sizeof(struct htab));
   if (newtable == NULL)
   {
       errx(1, "Not enough memory!");
   }
   newtable->capacity = 4;
   newtable->size = 0;
   newtable->data = calloc(// ??);
   return newtable;
}

Also, how could I test if this actually works?

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
  • Is the capacity the number of buckets in the hash-table? Otherwise it wouldn't make much sense. – Some programmer dude Dec 10 '18 at 11:26
  • I don't understand the concept of your hashtable implementation yet. Do you mean that your hash table of capacity 4 can have up to 4 different hash values and every hash value can have more than one key/value pair connected to it as a linked list? In this case you wouldn't need the `hkey` value in every list element. – Bodo Dec 10 '18 at 11:32
  • An array of elements. In this specific case, it is an array of 'struct pair' values. Actually, each cell in this array will contain a linked list of 'struct pair' values. For instance, if the array's capacity is four, it will contain four linked lists. These lists will use sentinels. So, when the array is empty (size = 0), it contains only sentinels. Linked lists will be used in order to handle collisions between keys – displayname Dec 10 '18 at 11:34
  • // Be careful, you have to allocate two memory spaces. // - The memory space that holds the 'struct htab' variable. // - The memory space that holds the data. // All cells of the 'data' array must be initialized to zero // (they contain the sentinels of the linked lists.) – displayname Dec 10 '18 at 11:35
  • I tried my best to sum up the instructions but I'm quite lost – displayname Dec 10 '18 at 11:35
  • 1
    So the `capacity` is not the number of possible elements in the table, but the number of *buckets* (please learn the terminology)? Then just allocate an "array" of `capacity` structures that you then initializes accordingly (all being your sentinels). If you want to change the number of buckets (i.e. change the `capacity`) you need to rehash *all* elements already in the table. – Some programmer dude Dec 10 '18 at 11:43
  • Sorry what do you mean by 'bucket' ? Is it just another name for capacity ? Even if your instructions are very clear I do not enough experience to follow them, could you detail a little bit more please – displayname Dec 10 '18 at 11:52
  • capacity is the total number of elements that can be stored in the data array. In other words, it is the length of the data array. – displayname Dec 10 '18 at 11:59
  • @displayname In correct terminology, this is the number of buckets. One bucket can contain more than one element, though. Capacity would normally refer how many different elements can be inserted, which can be higher than the number of buckets. Let's say you put a maximum number of elements per bucket, then `capacity == buckets * max`... – Aconcagua Dec 10 '18 at 12:01
  • Thank you for the clarification :) what can I do now ? – displayname Dec 10 '18 at 12:06

2 Answers2

0

These linked-lists contain sentinels (dummy value) as first element.

newtable->data = calloc(capacity, sizeof(*newtable->data));
// but you should check the result of calloc as well, just as you did for malloc!

Would you discover the slot actually being used by key having assigned a value? You wouldn't then be able to use the null pointer as key. If you use only the next pointer, then why use the struct at all? You could have an array of pointers, the sentinel would then be a null pointer:

struct pair** data;

Interestingly enough, with this, you won't need to change the call to calloc as I presented above (sizeof(*data) now will be size of a pointer...).

Sidenote: See about calloc and pointers...

Aconcagua
  • 24,880
  • 4
  • 34
  • 59
  • Sorry but you changed the structure and I don't want this :( – displayname Dec 10 '18 at 11:54
  • @displayname Changing the structure was just an *alternative* proposition, you don't need to. The call to calloc, as presented, works just absolutely fine even with unchanged structure (in which case you then get all pointer + `hkey` members set to 0). – Aconcagua Dec 10 '18 at 11:56
  • Could you tell me how I can test this function ? i have absolutely no idea – displayname Dec 10 '18 at 12:12
  • 1
    @displayname: Do reconsider. If you used `struct htab { size_t size; struct pair **slot; };` the implementation would be simpler and more efficient. See [here](https://stackoverflow.com/a/46392980/1475978) for an example of mine. To test, I'd create a small test program that used command-line parameters or standard input to add pairs to a hash table, then prints the contents out in e.g. Graphviz DOT format for visual checking. – Nominal Animal Dec 10 '18 at 12:16
  • @displayname For testing, simplest: Use a debugger and inspect the array elements. Or print out every element together with its members to console (and you'll have to trust `calloc` having allocated the appropriate number of elements...). – Aconcagua Dec 10 '18 at 12:18
  • I'd like to print this out, but in order to print every element what shall I do ? loop until newtable->data is what ? sorry I might ask stupid questions but I really learn much faster when having an example, rather than trying to discover the correct answer – displayname Dec 10 '18 at 12:24
  • `for(unsigned int i = 0; i < capacity; ++i) { printf("%p\n", table->data[i].next); }` - other members analogously (you can pack all of them into one single printf). – Aconcagua Dec 10 '18 at 12:33
  • YES!! thank you for your help I have now understood how you did this – displayname Dec 10 '18 at 12:43
0

In a comment, OP mentioned they learn better from an example. (This is not an answer per se, only an example. Please consider this an extended comment, rather than an answer.)

Let's look at a simple, real-world example, then; but something that cannot be just copy-pasted and presented as ones own work.

Let's assume we need a hash table of text or tokens, say

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

struct hashentry {
    struct hashentry   *next;
    size_t              hash;
    unsigned char       data[];
};

struct hashtable {
    size_t              size;
    struct hashentry  **slot;
};

where the table itself is an array of pointers, and hash collisions are solved via chaining. Note that instead of key-value pairs, I am essentially using keys only; this is to avoid having this example code copy-pasted and presented as someones own work. I wrote this to help new programmers understand, not for cheaters to submit as their homework. (I am not referring to OP, mind you. These questions are often found via a web search, and I write these answers for that overall group, not just the asker.)

Table initialization to a specific size:

static inline void hashtable_init(struct hashtable *const ht, const size_t size)
{
    size_t  i;

    if (!ht) {
        fprintf(stderr, "hashtable_init(): No hashtable specified (ht == NULL).\n");
        exit(EXIT_FAILURE);
    } else
    if (size < 1) {
        fprintf(stderr, "hashtable_init(): Invalid hashtable size (size == %zu).\n", size);
        exit(EXIT_FAILURE);
    }

    /* Allocate an array of pointers. */
    ht->slot = calloc(size, sizeof ht->slot[0]);
    if (!ht->slot) {
        fprintf(stderr, "hashtable_init(): Failed to allocate an array of %zu slots.\n", size);
        exit(EXIT_FAILURE);
    }
    ht->size = size;

    /* Mark each slot unused. (On all current architectures, this is unnecessary,
       as calloc() does initialize the pointers to NULL, but let's do it anyway.) */
    for (i = 0; i < size; i++)
        ht->slot[i] = NULL;
}

For a hash function, I like DJB2 Xor variant for text strings. It is not particularly good (there will be collisions), but it is very simple and fast:

static inline size_t  hash_data(const char *data, const size_t datalen)
{
    const char *const ends = data + datalen;
    size_t            hash = 5381;

    while (data < ends)
        hash = (33 * hash) ^ (unsigned char)(*(data++));

    return hash;
}

Note that I use size_t as the type for the hash. You could use any type you want, but on most architectures, it's the same size as a pointer, which is .

To add a data entry to the hash table:

static inline void hashtable_add(struct hashtable *ht, const char *data, const size_t datalen)
{
    struct hashentry *entry;
    size_t            hash, slotnum;

    if (!ht) {
        fprintf(stderr, "hashtable_add(): No hashtable specified (ht == NULL).\n");
        exit(EXIT_FAILURE);
    } else
    if (ht->size < 1) {
        fprintf(stderr, "hashtable_add(): Hashtable has zero size.\n");
        exit(EXIT_FAILURE);
    } else
    if (!data && datalen > 0) {
        fprintf(stderr, "hashtable_add(): data is NULL, but datalen == %zu.\n", datalen);
        exit(EXIT_FAILURE);
    }

    /* Allocate memory for the entry, including the data, and the string-terminating nul '\0'. */
    entry = malloc(sizeof (struct hashentry) + datalen + 1);
    if (!entry) {
        fprintf(stderr, "hashtable_add(): Out of memory (datalen = %zu).\n", datalen);        
        exit(EXIT_FAILURE);
    }

    /* Copy the data, if any. */
    if (datalen > 0)
        memcpy(entry->data, data, datalen);

    /* Ensure the data is terminated with a nul, '\0'. */
    entry->data[datalen] = '\0';

    /* Compute the hash. */
    hash = hash_data(data, datalen);
    entry->hash = hash;

    /* The slot number is the hash modulo hash table size. */
    slotnum = hash % ht->size;

    /* Prepend entry to the corresponding slot chain. */
    entry->next = ht->slot[slotnum];
    ht->slot[slotnum] = entry;
}

When I initially write code like the above, I always write it as a test program, and test it. (This technically falls under the unit testing paradigm.)

In this case, we can simply take the number of slots as a command-line parameter, and read each line from standard input as data to be added to the hash table.

Because standard C does not implement getline(), we'd better use fgets() instead, with a fixed-size line buffer. If we declare

#ifndef  MAX_LINE_LEN
#define  MAX_LINE_LEN  16383
#endif

we have a preprocessor macro MAX_LINE_LEN that defaults to 16383, but can be overridden at compile time by using compiler options. (With GCC, Intel CC, and clang, you can use e.g. -DMAX_LINE_LEN=8191 to halve it.)

In main(), I like to print the usage if the parameter count is incorrect, or -h or --help is the first parameter:

int main(int argc, char *argv[])
{
    char              buffer[MAX_LINE_LEN + 1];
    char             *line;
    size_t            size, i;
    struct hashtable  table;
    char              dummy;

    if (argc != 2 || !strcmp(argv[1], "-h") || !strcmp(argv[1], "--help")) {
        fprintf(stderr, "\n");
        fprintf(stderr, "Usage: %s [ -h | --help ]\n", argv[0]);
        fprintf(stderr, "       %s ENTRIES < DATA-FILE > DOT-FILE\n", argv[0]);
        fprintf(stderr, "\n");
        fprintf(stderr, "This program reads lines from DATA-FILE, adding them to\n");
        fprintf(stderr, "a hash table with ENTRIES slots and hash chaining.\n");
        fprintf(stderr, "When all input lines have been read, the contents of the\n");
        fprintf(stderr, "hash table slots will be output as a Graphviz DOT format graph.\n");
        fprintf(stderr, "\n");
        return EXIT_SUCCESS;
    }

Next, we can try and parse the first command-line parameter to size_t size;. I like to use a "sentinel" character, to detect if the parameter has garbage after the value (other than whitespace):

    if (sscanf(argv[1], "%zu %c", &size, &dummy) != 1 || size < 1) {
        fprintf(stderr, "%s: Invalid number of hash table entries.\n", argv[1]);
        return EXIT_FAILURE;
    }
    hashtable_init(&table, size);

The next part is to read each line from standard input, and add them to the hash table.

    while (1) {

        line = fgets(buffer, sizeof buffer, stdin);
        if (!line)
            break;

        /* Skip leading ASCII whitespace. */
        line += strspn(line, "\t\n\v\f\r ");

        /* Find out the remaining length of the line. */
        size = strlen(line);

        /* Ignore trailing ASCII whitespace. */
        while (size > 0 && (line[size-1] == '\t' || line[size-1] == '\n' ||
                            line[size-1] == '\v' || line[size-1] == '\f' ||
                            line[size-1] == '\r' || line[size-1] == ' '))
            size--;

        /* Ignore empty lines. */
        if (size < 1)
            continue;

        /* Add line to the hash table. */
        hashtable_add(&table, line, size);
    }

    /* Check if fgets() failed due to an error, and not EOF. */
    if (ferror(stdin) || !feof(stdin)) {
        fprintf(stderr, "Error reading from standard input.\n");
        return EXIT_FAILURE;
    }

At this point, we have table with size slots. I write my test programs to write either plain text output (if simple), or Graphviz DOT format output (if structured like graphs). In this case, the graph output format sounds better.

    /* Print the hash table contents as a directed graph, with slots as boxes. */
    printf("digraph {\n");

    for (i = 0; i < table.size; i++) {
        struct hashentry *next = table.slot[i];

        /* The slot box. */
        printf("    \"%zu\" [ shape=\"box\", label=\"%zu\" ];\n", i, i);

        if (next) {

            /* The edge from the slot box to the entry oval. */
            printf("    \"%zu\" -> \"%p\";\n", i, (void *)next);

            while (next) {
                struct hashentry *curr = next;

                /* Each entry oval; text shown is the value read from the file. */
                printf("    \"%p\" [ shape=\"oval\", label=\"%s\" ];\n", (void *)curr, curr->data);

                next = next->next;

                /* The edge to the next oval, if any. */
                if (next)
                    printf("    \"%p\" -> \"%p\";\n", (void *)curr, (void *)next);
            }
        } 
    }

    printf("}\n");
    return EXIT_SUCCESS;
}

That's it. If you compile and run the above program with 10 as the command-line argument, and feed

one
two
three
four
five
six
seven
eight
nine
ten

to its standard input, it will output

digraph {
    "0" [ shape="box", label="0" ];
    "1" [ shape="box", label="1" ];
    "1" -> "0xb460c0";
    "0xb460c0" [ shape="oval", label="three" ];
    "0xb460c0" -> "0xb46080";
    "0xb46080" [ shape="oval", label="one" ];
    "2" [ shape="box", label="2" ];
    "3" [ shape="box", label="3" ];
    "3" -> "0xb46180";
    "0xb46180" [ shape="oval", label="nine" ];
    "0xb46180" -> "0xb460a0";
    "0xb460a0" [ shape="oval", label="two" ];
    "4" [ shape="box", label="4" ];
    "4" -> "0xb46160";
    "0xb46160" [ shape="oval", label="eight" ];
    "0xb46160" -> "0xb46140";
    "0xb46140" [ shape="oval", label="seven" ];
    "5" [ shape="box", label="5" ];
    "5" -> "0xb46100";
    "0xb46100" [ shape="oval", label="five" ];
    "6" [ shape="box", label="6" ];
    "6" -> "0xb461a0";
    "0xb461a0" [ shape="oval", label="ten" ];
    "7" [ shape="box", label="7" ];
    "7" -> "0xb46120";
    "0xb46120" [ shape="oval", label="six" ];
    "0xb46120" -> "0xb460e0";
    "0xb460e0" [ shape="oval", label="four" ];
    "8" [ shape="box", label="8" ];
    "9" [ shape="box", label="9" ];
}

which fed to Graphviz dot will generate a nice graph:

Hash table graph

If you want to see the actual hash values above the data strings, change to

                /* Each entry oval; text shown is the value read from the file. */
                printf("    \"%p\" [ shape=oval, label=\"%zu:\\n%s\" ];\n", (void *)curr, curr->hash, curr->data);

As I said, the DJB2 Xor hash is not particularly good, and for the above input, you need at least 43 hash table slots to avoid chaining.

Nominal Animal
  • 38,216
  • 5
  • 59
  • 86