2

I have two dictionaries with key-value pairs as follows:

dict-1  ch:23, 100
        ch:24, 95

dict-2  Ch:23, 98
        ch:25, 100

Not all keys are present in the both dictionaries and each dictionary contains approximately 200,000 key-value pairs. What I want to do is compare or combine these two and produce an output text file such that if the key is in both dictionaries, I get both values, with an output file format like:

ch:23   100   98         
ch:24   95    .    
Ch:25   .     100

How can I do this?

Chris
  • 44,602
  • 16
  • 137
  • 156
jobrant
  • 25
  • 1
  • 4

1 Answers1

4

Note If you are using a dictionary (Unless OrderedDict), the order would not be preserved, so the final order of your result would not be same as you depicted in your example

Coming back to your example If

>>> d1={'ch:23': 100, 'ch:24': 95}
>>> d2={'ch:23': 98 ,'ch:25': 100}

You can try this

>>> d3=collections.defaultdict(list)
>>> for k,e in d1.items()+d2.items():
    d3[k].append(e)

If you want to preserve the Order, you need to create the original dictionary as an ordered dict in the first instance

Then you can do as

>>> d1
OrderedDict([('ch:23', 100), ('ch:24', 95)])
>>> d2
OrderedDict([('ch:23', 98), ('ch:25', 100)])
>>> d3=collections.OrderedDict()
>>> for k,e in d1.items()+d2.items():
    d3.setdefault(k,[]).append(e)   
>>> d3
OrderedDict([('ch:23', [100, 98]), ('ch:24', [95]), ('ch:25', [100])])
>>> 
Abhijit
  • 62,056
  • 18
  • 131
  • 204
  • Thank you for the OrderedDict. I needed to use this, and after installing python 2.7.x it seemed to solve my problem. However, when the value was the same for a shared key between both dict's, the original value was overwritten so that in the output it appeared to have only one value. – jobrant Apr 30 '12 at 21:01
  • I think i solved this problem with some help from a coder friend. After making the OrderedDict : – jobrant Apr 30 '12 at 21:04
  • friend helped with this:After making OrderedDict :merged_keys=d1_keys + d2_keys merged_keys = uniq(merged_keys) print merged_keys print len(merged_keys) d3=collections.OrderedDict() output_doc = open("combo.txt","w+") for ch_pos in merged_keys: line_output = ch_pos if (d1.has_key(ch_pos)): line_output = line_output + "\t" + d1[ch_pos] else: line_output = line_output + "\t" + "ND" if (d2.has_key(ch_pos)): line_output = line_output + "\t" + d2[ch_pos] else: line_output = line_output + "\t" + "ND" output_doc.write(line_output + "\n") – jobrant Apr 30 '12 at 21:10
  • 1
    1. OrderedDict doesn't support the + or += operators like dictionaries do, you can overcome this by using the update() method. 2. OrderedDict does not allow you to insert items, only append, which may be needed if you are focusing on order. This can be overcome by creating a new OrderedDict and inserting the item where you want, which is expensive. As an alternative you can override OrderedDict and add an insert() method. 3. I decided to just use lists. After bloating the code with an override of OrderedDict I decided the benefit was not worth the unreadable code. – Samuel Dec 18 '14 at 21:00