2

Once again, talking about my upcoming university project... I had a class today, where we could ask stuff about the project but I still haven't decided on the best way to do this.

Basically, I have a bunch of users (some struct with a couple of members) which must be quickly searched by name and by SSN. Since I will also need to use theses users on a Graph (for other operations), I will be working with pointers.

Now, I though about using two hash tables. One where the key is the name and another where the key is the SSN. But I don't like the idea of having two Hash Tables, simply with different keys and pointing to the same place.

It crossed my mind using a Hash Table with two keys but I don't even know if that is possible and I believe it's not. I just can't think of a way to do it, maybe there is one, or maybe not.

Besides these two solutions, I can't think of any other alternative... I may have to go with the two Hash Tables.

Do you guys suggest any other alternative?

rfgamaral
  • 16,546
  • 57
  • 163
  • 275
  • What do you mean by "use these users on a graph"? Are the users given all at once and then you must search for a bunch of them by name/SSN, or are you given some users, then some search queries, then more users, then more queries etc.? – IVlad Mar 02 '10 at 19:13
  • Please don't ever ever ever use SSN as a key :-( – Steven Schlansker Mar 02 '10 at 19:15
  • Don't bother with the graph part, that just means that I'll need to use a graph for something else but I cannot explain you what as I'm not there yet. @Steven Why not? Care to justify why? And, perhaps, provide an alternative? How do you recommend I quickly lookup users by SSN? – rfgamaral Mar 02 '10 at 19:23
  • @Nazgulled It's generally a bad idea to use a SSN as an index/key due to the added complexity involved (the extra decrypt/encrypt steps). That is, unless you are using critical personal data unencrypted (which you'd never do, right?). If you want to index into the table by SSN, run the SSN through a cryptographic hash as soon as you get it (something like MD5 should suffice) and do all of your indexing, etc using the hashed version. – bta Mar 02 '10 at 19:39
  • You shouldn't use SSN because 1) they change (you can get a new one issued in case of fraud) 2) they are considered private information, and it's hard(er) to protect your primary keys 3) they are reused (after death of their previous owner) – Steven Schlansker Mar 02 '10 at 20:03
  • This is a university project, not a real world application. There's no need for data encryption and a new one will not be issued, reused or whatever. No real data will be used in the application. :) – rfgamaral Mar 02 '10 at 20:13
  • Okay, just as long as you understand why I raised the flag. Habits learned at university tend to find their way into the real world! My personal information (SSN, etc) has been stolen no fewer than three times from different universities (only one of which I actually attended) so I feel pretty strongly about this :) – Steven Schlansker Mar 02 '10 at 20:50
  • I understand and you're a absolutely right. If this was a real application, I would use encryption of course. But it's pointless to bother with that in this project. – rfgamaral Mar 02 '10 at 20:58

5 Answers5

2

I'd go with two hash tables. Think of it as two indexes into a database. The database is your users, and you provide two indexes: one ssn index and one name index.

Hans W
  • 3,851
  • 1
  • 22
  • 21
  • That's what I think but what's bugging me is that I will always waste 2 pointers for the same user structure, one in each hash table. Maybe there's no way around it... – rfgamaral Mar 02 '10 at 19:20
  • 1
    The extra space for keeping two indexes shouldn't be an issue unless you are planning to store massive amounts of users. The issue is the added complexity of keeping the indexes up to date when adding, removing, renaming users or whatever you plan to do. If you think this could easily be handled, then I'd say this is the way to go. – Hans W Mar 02 '10 at 19:25
1

I think that two Hashtables are ok. Consider binary search trees also, they can be more compact but O(log n) search and harder to implement.

"Hash Table with two keys" never heard of it...

Andrey
  • 59,039
  • 12
  • 119
  • 163
  • Me neither, I just though it could exist lol... I considered binary search trees but like you said, they are O(log n), where hash tables is way faster (in the best case of course). Also, binary search trees could be an option if I was able to implement a balanced one, but I'm having trouble with that and I'll just skip it. – rfgamaral Mar 02 '10 at 19:11
  • of course you should use balanced tree. you shouldn't skip it, it is useful knowledge/skill. just google "avl tree" or "red-black tree", there are a lot of tutorials. – Andrey Mar 02 '10 at 19:16
  • I should skip it because I have deadlines and I don't have time to be learning new stuff at the moment. I'm not gonna fail this class just because I want to be more knowledgeable or skillful. I have time for that, to finish the project? Not so much... – rfgamaral Mar 03 '10 at 12:40
  • Balanced trees are not as compact as open-addressed hash tables. – Seun Osewa Sep 02 '10 at 02:12
1

I don't think there is a way to build a single hash table which supports two keys.

If you want both SSN-lookup and name-lookup to be really fast, then you need two hash tables. You have to remember to add to both of them, or remove from both of them.

Otherwise, you can make the more frequent one (e.g. SSN-lookup) as a hash-based lookup, and the other one as brute-force lookup from the hash table.

Arun
  • 19,750
  • 10
  • 51
  • 60
  • Naah... If I use brute-force lookup, I'll probably have a very low grade. The point of this project it to use the "best" data structures (that we learned in the previous semester) for each case. As long as we properly justify them, we can use anything. As I see it, Hash Tables are the answer (for my project of course). – rfgamaral Mar 02 '10 at 19:16
  • @Nazgulled: Thanks for clarifying the context. It seems minimizing Time-complexity is your goal. However, depending on the context, minimizing Space-complexity is a noble goal too! :-) Make sure you identify what is your goal. – Arun Mar 02 '10 at 19:32
1
  1. Two hash tables like you said. The advantage is that lookup will be very fast for RANDOM data or even real-life data. The disadvantage is that you don't know what your professors will throw at it (or do you?) and they might force the worst case.

  2. Balanced search trees. I recommend treaps: http://en.wikipedia.org/wiki/Treap - they are, in my opinion, the easiest to implement.

  3. Sort your users and binary search. Also O(log N) per search, and even easier to implement than a treap.

  4. A combination of hashes + sorted users / search trees, if you can afford the memory. This will make it O(1) best case and O(log N) worst case. If H[i] = list of objects that hashed to i, keep a count for each i that tells you how many objects are in that list. If that count is too big, use the sorted users list / search tree instead.

IVlad
  • 43,099
  • 13
  • 111
  • 179
  • They are preparing a data sample for we test our program yes, but we don't yet have access to it. But how can they force the best/worst case? Doesn't that depend on the hash function? – rfgamaral Mar 02 '10 at 19:25
  • 1
    It does depend on the hash function, but if they have access to your hash function they might be able to come up with test data that will trigger its worst case behavior. I don't think they'll actually go to those lengths, but it is possible. It's enough to find two different names that map to the same place in your hash table, then give you a list of 100 000 users, all having those two different names. Then searching is going to be really slow. For best running time, your best option is #4 on my list. The only disadvantage there is the memory used, but it does avoid worst case behavior. – IVlad Mar 02 '10 at 19:34
  • I don't think they will bother that far lol... Thanks for the tip though :) – rfgamaral Mar 02 '10 at 20:11
  • Well, surely you'll impress your professor if you get it to O(log N) worst case :). In fact, you can build the hash table first, then only keep a tree / sorted list if your hash table is not balanced enough. That way, you only use more memory in the worst case. – IVlad Mar 02 '10 at 20:17
  • We'll see, one thing at a time... :) – rfgamaral Mar 03 '10 at 12:41
0

What about concatenate the two keys and use as key?

Example i had x , y , z.

Concatanete x and y using a string or char as a separator. This is a simple way to do it.

At this post I see what could be more interesting to this solution: Multi-dimensional associative arrays in javascript

Community
  • 1
  • 1