3

In order to simplify the development of future school assignments I decided to create an API (is that what you would call it?) for two data structures I commonly use -- a linked list and a hash table.

In developing each of these I ended up with the following two insert functions:

int list_insert(list *l, char *data, unsigned int idx);
int hash_insert(hash_table **ht, char *data);

The list_insert() function (and all of the list functions) ended up being pass-by-value since I never had any need to directly modify the list * itself unless I was malloc'ing or free'ing it. However, because I wanted to include auto-rehashing in my hash table I found that I had to pass the table by-reference instead of by-value in any function that might force a rehash. Now I end up with syntax like the following:

list_insert(l, "foo", 3);
hash_insert(&ht, "foo");

The difference strikes me as a little odd and I found myself wondering if I should change the list functions to be pass-by-reference as well for consistency's sake -- even though none of my functions would need to leverage it. What's the typical consensus here? Should I only pass-by-reference if my function actually needs to modify its arguments or should I pass-by-reference for the sake of consistency?

Structure definitions:

typedef struct list_node list_node;
struct list_node {
    char *data;
    list_node *next;
    list_node *prev;
};

typedef struct list list;
struct list {
    list_node *head;
    list_node *tail;
    size_t size;
};

typedef struct hash_table hash_table;
struct hash_table {
    list **table;
    size_t entries;
    size_t buckets;
    float maxLoad;
    unsigned int (*hash)(char*, unsigned int);
};

List functions:

list *list_createList();
list_node *list_createNode();
void list_destroyList(list *l);
void list_destroyNode(list_node *n);
int list_append(list *l, char *data);
int list_insert(list *l, char *data, unsigned int idx);
int list_remove(list *l, char *data, int (*compar)(const void*, const void*));
void list_push(list *l, char *data);
char *list_pop(list *l);
int list_count(list *l, char *data, int (*compar)(const void*, const void*));
int list_reverse(list *l);
int list_sort(list *l, int (*compar)(const void*, const void*));
int list_print(list *l, void (*print)(char *data));

Hash functions:

hash_table *hash_createTable(size_t buckets, float maxLoad, unsigned int (*hash)(char*, unsigned int));
void hash_destroyTable(hash_table *ht);
list *hash_list(const hash_table **ht);
int hash_checkLoad(hash_table **ht);
int hash_rehash(hash_table **ht);
int hash_insert(hash_table **ht, char *data);
void hash_stats(hash_table *ht);
int hash_print(hash_table *ht, void (*print)(char*));
user2506293
  • 805
  • 1
  • 7
  • 13
  • What info does a `hash_table` resp. a `list` contain? Perhaps modifying the `list*` passed by reference makes as much sense as modifying the `hash_table*` on further consideration... – Deduplicator Dec 27 '14 at 07:49
  • Can you please let know what exactly your API's does and how your structures looks like? – Gopi Dec 27 '14 at 07:51
  • 4
    Suggest [Code Review](http://codereview.stackexchange.com) rather than here. – chux - Reinstate Monica Dec 27 '14 at 07:53
  • Why do you allocate the root on the heap? Just return it by value for creation (if you insist aggregate-initialization with `{0}` does not cut it), and always pass it by reference, and you get similar APIs for both. – Deduplicator Dec 27 '14 at 08:06
  • @gopi I added the structures and function prototypes. I think they're relatively self explanatory although if more information is needed I will be happy to provide it. – user2506293 Dec 27 '14 at 08:06
  • @Deduplicator I'm not sure why exactly -- that's just how I was always taught to do it. If I am understanding you correctly you are saying that I should simply declare the root `list` or `hash_table` structure on the stack since there is no need to allocate memory dynamically? – user2506293 Dec 27 '14 at 08:15
  • Yes. Actually, there could be one possible reason, if you want to make `list` and `hash_table` opaque types to enable completely changing them without recompiling the callers. – Deduplicator Dec 27 '14 at 08:18
  • In other words, that way I could change the size of `list` or `hash_table` without recompiling the caller? If that were the case, how would you suggest addressing the dissimilar API calls? – user2506293 Dec 27 '14 at 08:32

2 Answers2

1

That's up to you and takes a little intuition. When passing large structs I pass by reference so that I am not eating up extra stack space and burning cycles copying the struct. But with small struts like yours it may be more efficient to use the stack depending on your target processor, how often you are using the values, and what your compiler does. Your compiler may break that struct up and put its values into registers.

But if you do pass by reference and do not intend to modify the value it is best practice to pass a pointer to const, eg: const list * l. That way there isn't any risk of you accidentally modifying the value and it makes the interface cleaner- now the caller knows that the value won't be changing.

Consistency is nice and I personally would lean in that direction especially on large interface because it may make things easier in the long run, but I would definitely use const. In doing so you allow the compiler to discover any accidental assignments so that later you don't need to track down a hard to bug.

See also: Passing a struct to a function in C

Community
  • 1
  • 1
Nick
  • 1,361
  • 1
  • 14
  • 42
  • He's not passing structs, though, only pointers to structs (list *) and pointers to pointers to structs (hash_table **). – netcat Dec 27 '14 at 13:17
  • @netcat he is asking if he should pass the struct list directly if he does not intend to modify it or if he should pass a pointer to the struct is how I understood the question. Is that not correct? – Nick Dec 27 '14 at 13:27
1

Here is a general rule of thumb:

  • pass by value if its typdef is a native type (char, short, int, long, long long, double or float)
  • pass by reference if it is a union, struct or array

Additional considerations for passing by reference:

  • use const if it will not be modified
  • use restrict if pointers will not point to the same address

Sometimes a struct/union seems like the appropriate type, but can be replaced with arrays if the types are similar. This can help with optimization (loop vectorization for example)

technosaurus
  • 7,676
  • 1
  • 30
  • 52
  • Are you suggesting that the list structure, for example, is unnecessary? And that it would be better to instead declare an array of three pointers? Also, what are your thoughts on allocating the root of the list/hashtable on the heap as opposed to the stack? – user2506293 Dec 27 '14 at 20:14
  • Though I have used arrays for lists, I was more referring to structs with r,g,b,a or x,y,dx,dy where they are all the same type. As to heap v. stack I try to organize new code to use the stack where possible so I can't forget to free but sometimes *alloc is unavoidable especially with existing code. Either way it's good to be able to analyze the assembly output for comparison since modern compilers may optimize malloc calls to use efficient built in intrinsics. – technosaurus Dec 27 '14 at 22:53