Using string as an array-index identifier (without loop)

Question

Let's say you want to read from file and based on its content create a structure (or an array) containing multiple objects, for example:

struct { 
    unsigned id; 
    char name[16]; 
    float price; 
} *items;`

Then you want to reference to an object (some item) using the obtained name, because that's how a user would know what to look for.

However, implementing searches that use loops will be very slow, especially if you have to loop every time you want to access an item and you need to access it all the time. Converting string to integer and then using a lookup table (sacrificing memory for performance) is a solution, but what if the name is longer than 8 bytes.

What is the fastest approach of accessing allocated structure, filled from an arbitrary file, using name identifiers (a string from the structure) ?

Take a look at the example of [bsearch()](http://man7.org/linux/man-pages/man3/bsearch.3.html). — Iharob Al Asimi, Nov 02 '17 at 11:05
This is a broad question. You can start from here: https://en.wikipedia.org/wiki/Associative_array — hyde, Nov 02 '17 at 11:08
@SmokyPerson The platform is irrelevant, the link is for standard c. The example is ANSI c too. — Iharob Al Asimi, Nov 02 '17 at 11:09
*"What is the fastest approach"* depends on a lot of factors. There's not generic answer to that question. But in general, use hash table based associative array, if your only requirement is fast lookup. If you **also** want to produce a sorted/ordered list of all records fast, then there are more things to consider — hyde, Nov 02 '17 at 11:12

Iharob Al Asimi · Accepted Answer · 2017-11-02T11:14:32.093

3

Using a binary search tree for this is ideal. Of course, more advanced structures like B-Trees will probably increase performance. But knowing about this is all you need really, and binary search trees are pretty efficient in many cases like the one you describe above, they still are simple and easy to implement.

A simple method is to use bsearch() from the standard library, but then insertion into the data collection becomes difficult or inefficient.

These are still loop based, but they are far more efficient than linear lookup.

edited Nov 02 '17 at 11:14

answered Nov 02 '17 at 11:07

Iharob Al Asimi

52,653
6
59
97

How would you compare efficiency of using `bsearch()` vs "hash table based associative array" whatever that means – SmokyPerson Nov 02 '17 at 11:18
Hash table is more efficient, but harder to implement. Binary search is very efficient and very easy to implement. Lookup "*Complexity*" of each of these methods. Also space is another difference, [read this too](https://stackoverflow.com/questions/4128546/advantages-of-binary-search-trees-over-hash-tables). – Iharob Al Asimi Nov 02 '17 at 11:24
Useful. I'll go with binary search as it may conserve more memory. Now I remember I used hash tables before – SmokyPerson Nov 02 '17 at 11:35

Luke Smith · Answer 2 · 2017-11-02T11:18:00.280

1

One approach would be to use a hash table using a hashing function that takes a string.

As long as the table is suitably sized and the hashing function distributes evenly across the table it will remain efficient to add and find entries. But, if collisions occur, then some looping will be required to search through the entries with the same hash value.

edited Nov 02 '17 at 11:18

answered Nov 02 '17 at 11:13

Luke Smith

893
4
8

You should note that collisions should be handled and it's not a simple topic. – Iharob Al Asimi Nov 02 '17 at 11:13
@IharobAlAsimi you are correct, I was implying this through the comments on sizing and distribution, but should probably have been more explicit. I've expanded my answer a little. – Luke Smith Nov 02 '17 at 11:15

Using string as an array-index identifier (without loop)

2 Answers2