1

Let's say you want to read from file and based on its content create a structure (or an array) containing multiple objects, for example:

struct { 
    unsigned id; 
    char name[16]; 
    float price; 
} *items;`

Then you want to reference to an object (some item) using the obtained name, because that's how a user would know what to look for.

However, implementing searches that use loops will be very slow, especially if you have to loop every time you want to access an item and you need to access it all the time. Converting string to integer and then using a lookup table (sacrificing memory for performance) is a solution, but what if the name is longer than 8 bytes.


What is the fastest approach of accessing allocated structure, filled from an arbitrary file, using name identifiers (a string from the structure) ?

  • Take a look at the example of [bsearch()](http://man7.org/linux/man-pages/man3/bsearch.3.html). – Iharob Al Asimi Nov 02 '17 at 11:05
  • 1
    This is a broad question. You can start from here: https://en.wikipedia.org/wiki/Associative_array – hyde Nov 02 '17 at 11:08
  • @IharobAlAsimi Ahh I should've tagged mingw – SmokyPerson Nov 02 '17 at 11:08
  • @SmokyPerson The platform is irrelevant, the link is for standard c. The example is ANSI c too. – Iharob Al Asimi Nov 02 '17 at 11:09
  • 2
    *"What is the fastest approach"* depends on a lot of factors. There's not generic answer to that question. But in general, use hash table based associative array, if your only requirement is fast lookup. If you **also** want to produce a sorted/ordered list of all records fast, then there are more things to consider – hyde Nov 02 '17 at 11:12

2 Answers2

3

Using a binary search tree for this is ideal. Of course, more advanced structures like B-Trees will probably increase performance. But knowing about this is all you need really, and binary search trees are pretty efficient in many cases like the one you describe above, they still are simple and easy to implement.

A simple method is to use bsearch() from the standard library, but then insertion into the data collection becomes difficult or inefficient.

These are still loop based, but they are far more efficient than linear lookup.

Iharob Al Asimi
  • 52,653
  • 6
  • 59
  • 97
  • How would you compare efficiency of using `bsearch()` vs "hash table based associative array" whatever that means – SmokyPerson Nov 02 '17 at 11:18
  • Hash table is more efficient, but harder to implement. Binary search is very efficient and very easy to implement. Lookup "*Complexity*" of each of these methods. Also space is another difference, [read this too](https://stackoverflow.com/questions/4128546/advantages-of-binary-search-trees-over-hash-tables). – Iharob Al Asimi Nov 02 '17 at 11:24
  • Useful. I'll go with binary search as it may conserve more memory. Now I remember I used hash tables before – SmokyPerson Nov 02 '17 at 11:35
1

One approach would be to use a hash table using a hashing function that takes a string.

As long as the table is suitably sized and the hashing function distributes evenly across the table it will remain efficient to add and find entries. But, if collisions occur, then some looping will be required to search through the entries with the same hash value.

Luke Smith
  • 893
  • 4
  • 8
  • You should note that collisions should be handled and it's not a simple topic. – Iharob Al Asimi Nov 02 '17 at 11:13
  • @IharobAlAsimi you are correct, I was implying this through the comments on sizing and distribution, but should probably have been more explicit. I've expanded my answer a little. – Luke Smith Nov 02 '17 at 11:15