33

I am trying to find corresponding keys in two different dictionaries. Each has about 600k entries.

Say for example:

    myRDP = { 'Actinobacter': 'GATCGA...TCA', 'subtilus sp.': 'ATCGATT...ACT' }
    myNames = { 'Actinobacter': '8924342' }

I want to print out the value for Actinobacter (8924342) since it matches a value in myRDP.

The following code works, but is very slow:

    for key in myRDP:
        for jey in myNames:
            if key == jey:
                print key, myNames[key]

I've tried the following but it always results in a KeyError:

    for key in myRDP:
        print myNames[key]

Is there perhaps a function implemented in C for doing this? I've googled around but nothing seems to work.

Timur Shtatland
  • 12,024
  • 2
  • 30
  • 47
Austin Richardson
  • 8,078
  • 13
  • 43
  • 49

11 Answers11

53

Use sets, because they have a built-in intersection method which ought to be quick:

myRDP = { 'Actinobacter': 'GATCGA...TCA', 'subtilus sp.': 'ATCGATT...ACT' }
myNames = { 'Actinobacter': '8924342' }

rdpSet = set(myRDP)
namesSet = set(myNames)

for name in rdpSet.intersection(namesSet):
    print name, myNames[name]

# Prints: Actinobacter 8924342
RichieHindle
  • 272,464
  • 47
  • 358
  • 399
  • Of course, you have the time hit of creating the sets in the first place. I wonder if `[x for x in myRDP if x in myNames]` is quicker or slower than creating the two sets and taking the intersection? – John Fouhy Aug 23 '09 at 00:38
  • Without @Audism's data set we can't be sure, but my money would be on the sets being faster, even though they'll take some time to create. – RichieHindle Aug 23 '09 at 00:41
  • This one also took about 1 second. I didn't really notice any difference from creating the sets. – Austin Richardson Aug 23 '09 at 01:00
  • I did some tests using fake data: keys were strings generated as `repr(math.log(i))[-8:]` for i from 1..600000 (first dict) and for i from 300000..900000 (second dict). Dict values were None. Method using sets took about .4s, versus .3s for the list comprehension in my earlier comment. – John Fouhy Aug 23 '09 at 04:39
  • @John: OK, interesting. Python never ceases to impress me! – RichieHindle Aug 23 '09 at 09:50
  • 10
    ~6 years of Python later and I think this is the best option. – Austin Richardson Jun 01 '15 at 19:03
49

You could do this:

for key in myRDP:
    if key in myNames:
        print key, myNames[key]

Your first attempt was slow because you were comparing every key in myRDP with every key in myNames. In algorithmic jargon, if myRDP has n elements and myNames has m elements, then that algorithm would take O(n×m) operations. For 600k elements each, this is 360,000,000,000 comparisons!

But testing whether a particular element is a key of a dictionary is fast -- in fact, this is one of the defining characteristics of dictionaries. In algorithmic terms, the key in dict test is O(1), or constant-time. So my algorithm will take O(n) time, which is one 600,000th of the time.

John Fouhy
  • 41,203
  • 19
  • 62
  • 77
  • By the way, if that "O(n×m)" stuff is confusing you, this question might help: http://stackoverflow.com/questions/107165/big-o-for-eight-year-olds – John Fouhy Aug 23 '09 at 00:55
  • I'm not sure what your code did. I think it was printing out every key from myRDP and myNames, but kudos for the explanation – Austin Richardson Aug 23 '09 at 00:57
  • 2
    Inspect the second line carefully; it's not `for key in myNames`, it's `if key in myNames`. – Andrew Keeton Aug 23 '09 at 01:00
  • 1
    Also, +1 for explaining _why_ this method is faster. – Andrew Keeton Aug 23 '09 at 01:01
  • whoops! Your real code was very fast too. It took about a second. My original code would have taken ~166 hours! – Austin Richardson Aug 23 '09 at 01:03
  • Yep this is the right answer. Audism was doing a linear search on a dictionary. I don't get why people are suggesting using sets and intersections below... – Unknown Aug 23 '09 at 02:34
  • Sets and intersections are still useful for a more general case: when combining a list of dictionaries instead of just two. – John Jan 10 '19 at 11:33
34

in python 3 you can just do

myNames.keys() & myRDP.keys()

boson
  • 886
  • 1
  • 12
  • 25
9
for key in myRDP:
    name = myNames.get(key, None)
    if name:
        print key, name

dict.get returns the default value you give it (in this case, None) if the key doesn't exist.

Andrew Keeton
  • 22,195
  • 6
  • 45
  • 72
7

You could start by finding the common keys and then iterating over them. Set operations should be fast because they are implemented in C, at least in modern versions of Python.

common_keys = set(myRDP).intersection(myNames)
for key in common_keys:
    print key, myNames[key]
Roberto Bonvallet
  • 31,943
  • 5
  • 40
  • 57
3

Best and easiest way would be simply perform common set operations(Python 3).

a = {"a": 1, "b":2, "c":3, "d":4}
b = {"t1": 1, "b":2, "e":5, "c":3}
res = a.items() & b.items() # {('b', 2), ('c', 3)} For common Key and Value
res = {i[0]:i[1] for i in res}  # In dict format
common_keys = a.keys() & b.keys()  # {'b', 'c'}

Cheers!

vikas0713
  • 566
  • 7
  • 9
2

Use the get method instead:

 for key in myRDP:
    value = myNames.get(key)
    if value != None:
      print key, "=", value
João Silva
  • 89,303
  • 29
  • 152
  • 158
2

You can simply write this code and it will save the common key in a list. common = [i for i in myRDP.keys() if i in myNames.keys()]

0

Copy both dictionaries into one dictionary/array. This makes sense as you have 1:1 related values. Then you need only one search, no comparison loop, and can access the related value directly.

Example Resulting Dictionary/Array:

[Name][Value1][Value2]

[Actinobacter][GATCGA...TCA][8924342]

[XYZbacter][BCABCA...ABC][43594344]

...

Alex
  • 75,813
  • 86
  • 255
  • 348
  • There isn't a 1:1 relation. There should be slightly more values in myNames. Would this still work? – Austin Richardson Aug 23 '09 at 00:40
  • Yes this would still work. You just instantiate a new array and copy these values horizontally together on match. See example above. – Alex Aug 23 '09 at 01:30
0

Here is my code for doing intersections, unions, differences, and other set operations on dictionaries:

class DictDiffer(object):
    """
    Calculate the difference between two dictionaries as:
    (1) items added
    (2) items removed
    (3) keys same in both but changed values
    (4) keys same in both and unchanged values
    """
    def __init__(self, current_dict, past_dict):
        self.current_dict, self.past_dict = current_dict, past_dict
        self.set_current, self.set_past = set(current_dict.keys()), set(past_dict.keys())
        self.intersect = self.set_current.intersection(self.set_past)
    def added(self):
        return self.set_current - self.intersect 
    def removed(self):
        return self.set_past - self.intersect 
    def changed(self):
        return set(o for o in self.intersect if self.past_dict[o] != self.current_dict[o])
    def unchanged(self):
        return set(o for o in self.intersect if self.past_dict[o] == self.current_dict[o])

if __name__ == '__main__':
    import unittest
    class TestDictDifferNoChanged(unittest.TestCase):
        def setUp(self):
            self.past = dict((k, 2*k) for k in range(5))
            self.current = dict((k, 2*k) for k in range(3,8))
            self.d = DictDiffer(self.current, self.past)
        def testAdded(self):
            self.assertEqual(self.d.added(), set((5,6,7)))
        def testRemoved(self):      
            self.assertEqual(self.d.removed(), set((0,1,2)))
        def testChanged(self):
            self.assertEqual(self.d.changed(), set())
        def testUnchanged(self):
            self.assertEqual(self.d.unchanged(), set((3,4)))
    class TestDictDifferNoCUnchanged(unittest.TestCase):
        def setUp(self):
            self.past = dict((k, 2*k) for k in range(5))
            self.current = dict((k, 2*k+1) for k in range(3,8))
            self.d = DictDiffer(self.current, self.past)
        def testAdded(self):
            self.assertEqual(self.d.added(), set((5,6,7)))
        def testRemoved(self):      
            self.assertEqual(self.d.removed(), set((0,1,2)))
        def testChanged(self):
            self.assertEqual(self.d.changed(), set((3,4)))
        def testUnchanged(self):
            self.assertEqual(self.d.unchanged(), set())
    unittest.main()
hughdbrown
  • 47,733
  • 20
  • 85
  • 108
0
def combine_two_json(json_request, json_request2):
intersect = {}

for item in json_request.keys():
    if item in json_request2.keys():
        intersect[item]=json_request2.get(item)
return intersect
double-beep
  • 5,031
  • 17
  • 33
  • 41
  • 2
    While this code may solve the question, [including an explanation](//meta.stackexchange.com/q/114762) of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please [edit] your answer to add explanations and give an indication of what limitations and assumptions apply. – Adrian Mole May 30 '20 at 11:35