0

I have data like --

sample 1, domain 1, value 1
sample 1, domain 2, value 1
sample 2, domain 1, value 1
sample 2, domain 3, value 1

-- stored in a dictionary --

dict_1 = {('sample 1','domain 1'): value 1, ('sample 1', 'domain 2'): value 1} 

-- etc.

Now, I have a different kind of value, named value 2 --

sample 1, domain 1, value 2
sample 1, domain 2, value 2
sample 2, domain 1, value 2
sample 2, domain 3, value 2

-- which I again put in a dictionary,

dict_2 = {('sample 1','domain 1'): value 2, ('sample 1', 'domain 2'): value 2}

How can I merge these two dictionaries in python? The keys, for instance ('sample 1', 'domain 1') are the same for both dictionaries.

I expect it to look like --

final_dict = {('sample 1', 'domain 1'): (value 1, value 2), ('sample 1', 'domain 2'): (value 1, value 2)}

-- etc.

TallTed
  • 9,069
  • 2
  • 22
  • 37
Dymphy
  • 13
  • 5
  • what do you expect these two dictionaries "merged" should look like? – Paritosh Singh Jan 08 '19 at 14:57
  • 1
    Possible duplicate of [How to merge two dictionaries in a single expression?](https://stackoverflow.com/questions/38987/how-to-merge-two-dictionaries-in-a-single-expression) – Jared Smith Jan 08 '19 at 14:58
  • @JaredSmith: Not a duplicate (of that question in any event); this one seems to want to preserve the values from both `dict`s, not keep the last value for a given key. – ShadowRanger Jan 08 '19 at 14:58
  • What do you mean by "merge"? What is the expected output (concrete example)? – Him Jan 08 '19 at 14:59
  • So what do you want the merged dict to do with the collision of keys? Should ('sample 1', 'domain 1') be mapped to value 1 or value 2? – Endyd Jan 08 '19 at 14:59
  • Both values should be added to the same key. – Dymphy Jan 08 '19 at 14:59
  • Can you use a reduceByKey type operation directly in your query? Use a lambda to add one-element lists containing those values together. One option... – blacksite Jan 08 '19 at 15:02
  • 2
    Without seeing both SPARQL queries, it's impossible to say whether a single query would be possible ... – UninformedUser Jan 08 '19 at 15:14
  • For each datapoint, I need the sample name, protein domain and two scores attached to it. I'm sorry I cannot be more specific. – Dymphy Jan 08 '19 at 15:53
  • *it is not possible to combine 2 queries.* — `UNION`? – Stanislav Kralin Jan 08 '19 at 19:46
  • While the answers below do answer your question about "merging dictionaries in Python" (and suggest SPARQL should be removed from the question tags), I think you have fallen into the trap of the [XY Problem](https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem#66378). It is nearly certain that you could use a single SPARQL query to get the four solution columns you desire (sample name, protein domain, score 1, score 2), but we cannot tell you how, without seeing the two queries you're running now. – TallTed Jan 09 '19 at 22:13
  • Maybe, but that is not the scope of this question. – Dymphy Jan 10 '19 at 15:37

4 Answers4

2

The closest you're likely to get to this would be a dict of lists (or sets). For simplicity, you usually go with collections.defaultdict(list) so you're not constantly checking if the key already exists. You need to map to some collection type as a value because dicts have unique keys, so you need some way to group the multiple values you want to store for each key.

from collections import defaultdict

final_dict = defaultdict(list)

for d in (dict_1, dict_2):
    for k, v in d.items():
        final_dict[k].append(v)

Or equivalently with itertools.chain, you just change the loop to:

from itertools import chain

for k, v in chain(dict_1.items(), dict_2.items()):
    final_dict[k].append(v)

Side-note: If you really need it to be a proper dict at the end, and/or insist on the values being tuples rather than lists, a final pass can convert to such at the end:

final_dict = {k: tuple(v) for k, v in final_dict.items()}
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • This would be the cleanest way to do it without suffering from KeyErrors. +1 – Paritosh Singh Jan 08 '19 at 15:07
  • The first solution seems to be working for me (strangely, the second does not). I merely need to have the values be ready for plotting against each other. Massive thanks! – Dymphy Jan 08 '19 at 15:48
  • @Dymphy: To be clear, the second block should still include the import for `defaultdict` and the initialization of the empty `final_dict` from the first block. I omitted them for brevity, but `final_dict` still needs to be a `defaultdict(list)` in both cases. The second block is just showing a way to reduce the level of loop nesting. If adding the imports and initial definition of `final_dict` doesn't make the second loop work, let me know (and provide the error), because it should be exactly equivalent. – ShadowRanger Jan 08 '19 at 17:41
  • Yes, that was indeed the problem. Thank you for your help! – Dymphy Jan 09 '19 at 16:07
1

You can use set intersection of keys to do this:

dict_1 = {('sample 1','domain 1'): 'value 1', ('sample 1', 'domain 2'): 'value 1'} 
dict_2 = {('sample 1','domain 1'): 'value 2', ('sample 1', 'domain 2'): 'value 2'} 

result = {k: (dict_1.get(k), dict_2.get(k)) for k in dict_1.keys() & dict_2.keys()}

print(result)
# {('sample 1', 'domain 1'): ('value 1', 'value 2'), ('sample 1', 'domain 2'): ('value 1', 'value 2')}

The above uses dict.get() to avoid possibilities of a KeyError being raised(very unlikely), since it will just return None by default.

As @ShadowRanger suggests in the comments, If a key is for some reason not found, you could replace from the opposite dictionary:

{k: (dict_1.get(k, dict_2.get(k)), dict_2.get(k, dict_1.get(k))) for k in dict_1.keys() | dict_2.keys()}
RoadRunner
  • 25,803
  • 6
  • 42
  • 75
  • 1
    @RoadRunner: Or for amusement (or seriously for some really esoteric scenarios), you could make each `get`'s default the value from the other `dict`, so if only one dict has the key, you get a `tuple` with the same value twice: `{k: (dict_1.get(k, dict_2.get(k)), dict_2.get(k, dict_1.get(k))) for k in dict_1.keys() | dict_2.keys()}` :-) A little wasteful, but harmless. – ShadowRanger Jan 08 '19 at 15:09
0

Does something handcrafted like this work for you?

dict3 = {} 
for i in dict1: 
    dict3[i] = (dict1[i], dict2[i]) 
jimifiki
  • 5,377
  • 2
  • 34
  • 60
  • Sorry for that. Have you tried to replace dict1 and dict2 with your dictionaries name? Are you sure that the two dictionaries have the same keys? – jimifiki Jan 08 '19 at 16:43
-1
from collections import defaultdict
from itertools import chain
dict_1 = {('sample 1','domain 1'): 1, ('sample 1', 'domain 2'): 2} 
dict_2 = {('sample 1','domain 1'): 3, ('sample 1', 'domain 2'): 4}

new_dict_to_process = defaultdict(list)
dict_list=[dict_1.items(),dict_2.items()]
for k,v in chain(*dict_list):
     new_dict_to_process[k].append(v)

Output will be

{('sample 1', 'domain 1'): [1, 3],
 ('sample 1', 'domain 2'): [2, 4]})
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
mad_
  • 8,121
  • 2
  • 25
  • 40