Best Data Structure for storing license plates and searching if a given license plate exists

Question

I am attempting to write a program that determines whether a specific license plate is one of 10,000 that I have stored. I want to write a fast response algorithm first and foremost, with memory usage as a secondary objective. Would a balanced binary search tree or a hash table be more sufficient in storing the 10,000 license plate numbers (which also contain letters)?

If you are looking at speed is Python the best language for this? — , Dec 20 '13 at 00:53
In hindsight, probably not. I can look into a different language — user3068647, Dec 20 '13 at 00:55
10000 license plates is nothing for a normal computer, Python can handle it perfectly fine. — Matteo Italia, Dec 20 '13 at 00:57
@LegoStormtroopr: Python's hash table implementation is probably faster than anything you're likely to write yourself in another language, and faster than most of the popular C hash table libraries, so if that's the slow part of his code, yes, Python may well be the best language for this. — abarnert, Dec 20 '13 at 01:03
@MatteoItalia I'm not saying Python can't, and I'm a fan of the language. But if its a need to process 10000 licence plates, every second as cars travel by, giving near instance feedback to an officer in a car, the difference between Python and C could be substantial. — , Dec 20 '13 at 01:03
@LegoStormtroopr: You're missing the whole point of a hash table (and, for that matter, a binary tree). You don't need to process 10000 license plates, you need to hash 1 license plate and see whether it exists (or process log(10000) ~ 13 license plates). The fact that Python takes 50x longer for loop overhead than C is irrelevant when your loop only takes microseconds even in Python and only has to run 1 (or 13) times per second. — abarnert, Dec 20 '13 at 01:05
So, what is the actual requirement here? A BST won't be as fast as a hash table, but it will be "fast". Is it fast enough? Nobody can answer that unless they know what you're doing it for. seaotternerd's answer gives you enough information to decide that for yourself, but if you'd stated your actual needs you could have gotten a better answer. — abarnert, Dec 20 '13 at 01:10
I was mostly concerned with determining which data structure would be quickest in determining whether a given string is present in a database or not. You have all more than answered my question. Thank You! — user3068647, Dec 20 '13 at 01:13

score 7 · Accepted Answer · answered Dec 20 '13 at 00:52

7

A hash table takes O(1) time to look up any given entry (i.e. to check whether or not it is in the data structure), whereas a binary search tree takes O(logn) time. Therefore, a hash table will be a more efficient option in terms of response speed.

Binary search trees are more useful in scenarios where you need to display things in order, or find multiple similar entries.

answered Dec 20 '13 at 00:52

seaotternerd

6,298
2
47
58

1

Also, typical implementations of hash tables tend to be more memory efficient. – Matteo Italia Dec 20 '13 at 00:56
2

Plus, Python comes with an excellent hash table built in, while binary trees, you'll have to pick from various different third-party implementations, none of which have gotten as much testing and optimization effort over the years as Python's builtins. – abarnert Dec 20 '13 at 01:06
1

That being said, even a slow, designed-for-teaching-rather-than-performance, pure-Python binary tree should be more than fast enough, and memory-efficient enough, to handle a lookup against 10000 plates in almost any use case you can imagine, on even the slowest modern computers. So, most likely this whole question is irrelevant. – abarnert Dec 20 '13 at 01:08

score 5 · Answer 2 · answered Dec 20 '13 at 01:18

Sounds like you really want a Python set (which is hash-table based). Here's a sample:

from random import choice
from string import ascii_uppercase as LETTERS
from string import digits as DIGITS
print LETTERS
print DIGITS

def get_plate():
    return choice(LETTERS) + \
           choice(LETTERS) + \
           choice(LETTERS) + \
           choice(DIGITS) + \
           choice(DIGITS) + \
           choice(DIGITS)

licenses = set()
while len(licenses) < 10000:
    licenses.add(get_plate())

from time import time
t1 = time()
for _ in xrange(1000000):
    get_plate() in licenses
t2 = time()
print t2 - t1

That builds a set with 10,000 random plates (3 uppercase letters and 3 digits each), then checks a million random plates to see whether they're in that set. Here's the output:

ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789
5.88199996948

So on this box it took under 6 seconds to check a million plates - and by far most of that time was consumed by generating the million random plates. Change to:

a_plate = get_plate()
t1 = time()
for _ in xrange(1000000):
    a_plate in licenses
...

and it's 35 times faster. That should be quick enough to handle the traffic on even a moderately busy road ;-)

score 1 · Answer 3 · edited May 23 '17 at 10:30

A python dictionary is what you want. Dictionary lookups are O(1). A python dictionary is effectively a hash table so there is some memory overhead but lookups are fast.

Refer to a previous stackoverflow question here.

You could even store data meta-data about the license plate very easily. Your keys can be any form of license plate you want, integers, strings, or even a tuple.

Example code:

license_plates = {'AB-12-23' :{'make':'Dodge','color':'White'},
                  'SWINGER'  :{'make':'Alpha','color':'Blue'},
                  ('ABC',123):{'make':'Ford','color':'Yellow'},
                  123456     :{'make':'Ferrari','color':'Red'}}

print '%s in list of license plates? %s' % ('SWINGER', 'SWINGER' in license_plates)

Best Data Structure for storing license plates and searching if a given license plate exists

3 Answers3