0

I am dealing with a database in which someone created a PHP ArrayObject that had virtually no checks in place before being created.

I am able to extract this as a dictionary of dictionaries using the python phpserialize library's unserialize module, so that it looks like this:

{0: '{"0": {"0": "use_of_color", "1": "3"}, "1": {"0": "plant_variety", "1": "2"}, "2": {"0": "design", "1": "2"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "3"}}',
 1: '{"0": {"0": "use_of_color", "1": "3"}, "1": {"0": "plant_variety", "1": "2"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "4"}, "4": {"0": "environmental_stewardship", "1": "2"}}',
 2: '{"0": {"0": "color", "1": 3}, "1": {"0": "plant_variety", "1": 3}, "2": {"0": "design", "1": 4}, "3": {"0": "maintenance", "1": 4}, "4": {"0": "environmental_stewardship", "1": 4}}',
 3: '{"0": {"0": "location", "1": "4"}, "1": {"0": "sizing", "1": "4"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "3"}, "4": {"0": "visual_appeal", "1": "4"}}',
 4: '{"0": {"0": "visual_impact", "1": "3"}, "1": {"0": "plant_variety_and_health", "1": "4"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": 0}, "4": {"0": "environmental_stewardship", "1": "2"}}',
 5: '{"0": {"0": "location", "1": "3"}, "1": {"0": "sizing", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "3"}, "4": {"0": "visual_appeal", "1": "3"}}',

...

56: '{"0": {"0": "visual_impact", "1": "2"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "1"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "1"}}',
 57: '{"0": {"0": "color", "1": 3}, "1": {"0": "plant_variety", "1": 2}, "2": {"0": "design", "1": 1}, "3": {"0": "maintenance", "1": 2}, "4": {"0": "environmental_stewardship", "1": 2}}',
 58: '{"0": {"0": "visual_impact", "1": "4"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "4"}, "3": {"0": "maintenance", "1": "3"}, "4": {"0": "environmental_stewardship", "1": "4"}}',
 59: '{"0": {"0": "visual_impact", "1": "3"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "3"}, "4": {"0": "environmental_stewardship", "1": "4"}}',
 60: '{"0": {"0": "use_of_color", "1": "3"}, "1": {"0": "plant_variety", "1": "4"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "3"}}',
 61: '{"0": {"0": "use_of_color", "1": "3"}, "1": {"0": "plant_variety", "1": "4"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "3"}}',
 62: '{"0": {"0": "visual_impact", "1": "2"}, "1": {"0": "plant_variety_and_health", "1": "2"}, "2": {"0": "design", "1": "2"}, "3": {"0": "maintenance", "1": "1"}, "4": {"0": "environmental_stewardship", "1": "1"}}',
 63: '{"0": {"0": "visual_impact", "1": "4"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "4"}, "4": {"0": "environmental_stewardship", "1": "4"}}',
 64: '{"0": {"0": "visual_impact", "1": "4"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "4"}, "4": {"0": "environmental_stewardship", "1": "4"}}',
 65: '{"0": {"0": "visual_impact", "1": "3"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "3"}}'}

The problem is that I need a way to extract the sub dictionaries that have a the same values (e.g., all those with "visual_impact" or "color", etc.). However, since these sub dictionaries are not paired with the same key throughout the object, this seems not possible.

I am thinking that maybe reassigning the key names to align with the values would be doable.

So, for example

dict = {"0": {"0": "color", "1": 3}, "1": {"0": "plant_variety", "1": 3}, "2": {"0": "design", "1": 4}, "3": {"0": "maintenance", "1": 4}, "4": {"0": "environmental_stewardship", "1": 4}}

Would instead become

dict = {"0": {"0": "color", "1": 3}, "4": {"0": "plant_variety", "1": 3}, "1": {"0": "design", "1": 4}, "3": {"0": "maintenance", "1": 4}, "2": {"0": "environmental_stewardship", "1": 4}}

Thus, for dict["0"] I want to always have "color" in the sub dictionary/value, dict["1"] would always have "design", etc. So, for my example dict above, dict["0"] would give {"0": "color", "1": 3}, dict["1"] would give {"0": "design", "1": 4}, etc.

Thus, I am trying to reassign the keys based on what is in the value/sub dictionary. Key "0" always has "color" in the sub dictionary/value, key "1" always has "design", etc. for the whole dictionary of dictionaries listed above.

I found this change-the-name-of-a-key-in-dictionary, but this object is confusing in terms of how to do this, since this is dependent on the value/sub dictionary's content.

I know that I have to deal with making sure that values, such as, 'use_of_color' is changed to 'color', etc. are uniformly named before doing this, but that should not be a problem. I just need a way to ensure that I am always extracting the sub dictionary with the value of 'color' by the same key, and the only way I can see of doing this is by reassigning the keys.

If there is a better way to deal with this, I am open to suggestions.

Community
  • 1
  • 1
horcle_buzz
  • 2,101
  • 3
  • 30
  • 59

2 Answers2

2

I'm assuming you want a grouping of the sub-dictionaries by the value of their key '0', i.e. by 'location', 'environmental_stewardship', etc. But actually, you don't have subdictionaries at all, you have strings that are dictionary literals. If your dictionary were named horrible_mess, you could use this quick hack:

>>> from ast import literal_eval
>>> still_messy  = {k:literal_eval(v) for k,v in horrible_mess.items()}

Then, it's probably easiest to simply do the following:

>>> from collections import defaultdict
>>> grouped = defaultdict(list)
>>> for sub in still_messy.values():
...     for d in sub.values():
...         grouped[d['0']].append(d)
... 
>>> grouped['visual_appeal']
[{'1': '4', '0': 'visual_appeal'}, {'1': '3', '0': 'visual_appeal'}]
>>> grouped['environmental_stewardship']
[{'1': '3', '0': 'environmental_stewardship'}, {'1': '2', '0': 'environmental_stewardship'}, {'1': 4, '0': 'environmental_stewardship'}, {'1': '2', '0': 'environmental_stewardship'}, {'1': '3', '0': 'environmental_stewardship'}, {'1': '4', '0': 'environmental_stewardship'}, {'1': '1', '0': 'environmental_stewardship'}, {'1': 2, '0': 'environmental_stewardship'}, {'1': '4', '0': 'environmental_stewardship'}, {'1': '4', '0': 'environmental_stewardship'}, {'1': '3', '0': 'environmental_stewardship'}, {'1': '3', '0': 'environmental_stewardship'}, {'1': '1', '0': 'environmental_stewardship'}, {'1': '4', '0': 'environmental_stewardship'}]
>>> 
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
  • Yes, dictionary literals are not sub dictionaries. This is not how I envisioned doing this, but it is exactly what I need. – horcle_buzz Aug 19 '16 at 02:51
  • In terms of use case, I need to plot all 'visual_appeals,' 'environmental_stewardship,' etc. against each other (everything within the dictionary literal with a key of '1' is a score), so grouping these by key will help to better filter the key value pair within the dictionary literal. horrible_mess, indeed, but this will get me there. – horcle_buzz Aug 19 '16 at 03:04
  • `grouped = defaultdict([])` gives an error of `TypeError: first argument must be callable or None` ... the solution is to explicitly state the `list` type here, as `defaultdict(list)`. Otherwise this works well. – horcle_buzz Aug 19 '16 at 17:06
  • 1
    @horcle_buzz and actually, it takes any callable, so you could do something like `d = defaultdict(lambda: [1,2,3]); d[0].append(4)` and then `d[0]` is `[1,2,3,4]`. – juanpa.arrivillaga Aug 19 '16 at 17:43
  • @horcle_buzz That was just meant to be an example of the sorts of things you *can* do, not applicable to your situation necessarily. Try it in the interactive interpreter. – juanpa.arrivillaga Aug 19 '16 at 18:21
  • Yes, I figured that out after glancing at it (which is why I deleted my comment). I believe given this, I should be able to get what I need out of this ugly mess. – horcle_buzz Aug 19 '16 at 18:26
1

Please let me know if this is the output you expect, if not could you please include expected output in your question? I apologize that the code is ugly, but it basically just loops through all of the sub-dictionaries and only considers values that can't be converted to int (this is a hack) and uses those as keys in a new dictionary.

CODE:

from ast import literal_eval
data = {0: '{"0": {"0": "use_of_color", "1": "3"}, "1": {"0": "plant_variety", "1": "2"}, "2": {"0": "design", "1": "2"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "3"}}',
 1: '{"0": {"0": "use_of_color", "1": "3"}, "1": {"0": "plant_variety", "1": "2"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "4"}, "4": {"0": "environmental_stewardship", "1": "2"}}',
 2: '{"0": {"0": "color", "1": 3}, "1": {"0": "plant_variety", "1": 3}, "2": {"0": "design", "1": 4}, "3": {"0": "maintenance", "1": 4}, "4": {"0": "environmental_stewardship", "1": 4}}',
 3: '{"0": {"0": "location", "1": "4"}, "1": {"0": "sizing", "1": "4"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "3"}, "4": {"0": "visual_appeal", "1": "4"}}',
 4: '{"0": {"0": "visual_impact", "1": "3"}, "1": {"0": "plant_variety_and_health", "1": "4"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": 0}, "4": {"0": "environmental_stewardship", "1": "2"}}',
 5: '{"0": {"0": "location", "1": "3"}, "1": {"0": "sizing", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "3"}, "4": {"0": "visual_appeal", "1": "3"}}',
56: '{"0": {"0": "visual_impact", "1": "2"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "1"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "1"}}',
 57: '{"0": {"0": "color", "1": 3}, "1": {"0": "plant_variety", "1": 2}, "2": {"0": "design", "1": 1}, "3": {"0": "maintenance", "1": 2}, "4": {"0": "environmental_stewardship", "1": 2}}',
 58: '{"0": {"0": "visual_impact", "1": "4"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "4"}, "3": {"0": "maintenance", "1": "3"}, "4": {"0": "environmental_stewardship", "1": "4"}}',
 59: '{"0": {"0": "visual_impact", "1": "3"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "3"}, "4": {"0": "environmental_stewardship", "1": "4"}}',
 60: '{"0": {"0": "use_of_color", "1": "3"}, "1": {"0": "plant_variety", "1": "4"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "3"}}',
 61: '{"0": {"0": "use_of_color", "1": "3"}, "1": {"0": "plant_variety", "1": "4"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "3"}}',
 62: '{"0": {"0": "visual_impact", "1": "2"}, "1": {"0": "plant_variety_and_health", "1": "2"}, "2": {"0": "design", "1": "2"}, "3": {"0": "maintenance", "1": "1"}, "4": {"0": "environmental_stewardship", "1": "1"}}',
 63: '{"0": {"0": "visual_impact", "1": "4"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "4"}, "4": {"0": "environmental_stewardship", "1": "4"}}',
 64: '{"0": {"0": "visual_impact", "1": "4"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "4"}, "4": {"0": "environmental_stewardship", "1": "4"}}',
 65: '{"0": {"0": "visual_impact", "1": "3"}, "1": {"0": "plant_variety_and_health", "1": "3"}, "2": {"0": "design", "1": "3"}, "3": {"0": "maintenance", "1": "2"}, "4": {"0": "environmental_stewardship", "1": "3"}}'}


sep_by_type = {}
for key1,val1 in data.iteritems():
    val1 = literal_eval(val1) #because val1 is a string not a dict
    for key2,val2 in val1.iteritems():
        for key3,val3 in val2.iteritems():
            try:
                int(val3)
            except:
                if val3 not in sep_by_type:
                    sep_by_type[val3] = [val2]
                else:
                    sep_by_type[val3].append(val2)

for sep_key in sep_by_type:
    print sep_key,sep_by_type[sep_key]
    print ""

OUTPUT

sizing [{'1': '4', '0': 'sizing'}, {'1': '3', '0': 'sizing'}]

plant_variety [{'1': '2', '0': 'plant_variety'}, {'1': '2', '0': 'plant_variety'}, {'1': 3, '0': 'plant_variety'}, {'1': 2, '0': 'plant_variety'}, {'1': '4', '0': 'plant_variety'}, {'1': '4', '0': 'plant_variety'}]

maintenance [{'1': '2', '0': 'maintenance'}, {'1': '4', '0': 'maintenance'}, {'1': 4, '0': 'maintenance'}, {'1': '3', '0': 'maintenance'}, {'1': 0, '0': 'maintenance'}, {'1': '3', '0': 'maintenance'}, {'1': '2', '0': 'maintenance'}, {'1': '4', '0': 'maintenance'}, {'1': '2', '0': 'maintenance'}, {'1': 2, '0': 'maintenance'}, {'1': '3', '0': 'maintenance'}, {'1': '3', '0': 'maintenance'}, {'1': '2', '0': 'maintenance'}, {'1': '2', '0': 'maintenance'}, {'1': '1', '0': 'maintenance'}, {'1': '4', '0': 'maintenance'}]

use_of_color [{'1': '3', '0': 'use_of_color'}, {'1': '3', '0': 'use_of_color'}, {'1': '3', '0': 'use_of_color'}, {'1': '3', '0': 'use_of_color'}]

color [{'1': 3, '0': 'color'}, {'1': 3, '0': 'color'}]

plant_variety_and_health [{'1': '4', '0': 'plant_variety_and_health'}, {'1': '3', '0': 'plant_variety_and_health'}, {'1': '3', '0': 'plant_variety_and_health'}, {'1': '3', '0': 'plant_variety_and_health'}, {'1': '3', '0': 'plant_variety_and_health'}, {'1': '3', '0': 'plant_variety_and_health'}, {'1': '2', '0': 'plant_variety_and_health'}, {'1': '3', '0': 'plant_variety_and_health'}]

visual_appeal [{'1': '4', '0': 'visual_appeal'}, {'1': '3', '0': 'visual_appeal'}]

design [{'1': '2', '0': 'design'}, {'1': '3', '0': 'design'}, {'1': 4, '0': 'design'}, {'1': '3', '0': 'design'}, {'1': '3', '0': 'design'}, {'1': '3', '0': 'design'}, {'1': '3', '0': 'design'}, {'1': '3', '0': 'design'}, {'1': '1', '0': 'design'}, {'1': 1, '0': 'design'}, {'1': '4', '0': 'design'}, {'1': '3', '0': 'design'}, {'1': '3', '0': 'design'}, {'1': '3', '0': 'design'}, {'1': '2', '0': 'design'}, {'1': '3', '0': 'design'}]

location [{'1': '4', '0': 'location'}, {'1': '3', '0': 'location'}]

environmental_stewardship [{'1': '3', '0': 'environmental_stewardship'}, {'1': '2', '0': 'environmental_stewardship'}, {'1': 4, '0': 'environmental_stewardship'}, {'1': '2', '0': 'environmental_stewardship'}, {'1': '3', '0': 'environmental_stewardship'}, {'1': '4', '0': 'environmental_stewardship'}, {'1': '1', '0': 'environmental_stewardship'}, {'1': 2, '0': 'environmental_stewardship'}, {'1': '4', '0': 'environmental_stewardship'}, {'1': '4', '0': 'environmental_stewardship'}, {'1': '3', '0': 'environmental_stewardship'}, {'1': '3', '0': 'environmental_stewardship'}, {'1': '1', '0': 'environmental_stewardship'}, {'1': '4', '0': 'environmental_stewardship'}]

visual_impact [{'1': '3', '0': 'visual_impact'}, {'1': '3', '0': 'visual_impact'}, {'1': '4', '0': 'visual_impact'}, {'1': '2', '0': 'visual_impact'}, {'1': '4', '0': 'visual_impact'}, {'1': '3', '0': 'visual_impact'}, {'1': '2', '0': 'visual_impact'}, {'1': '4', '0': 'visual_impact'}]

UPDATE I used literal_eval instead of eval which is safer. (thanks juanpa.arrivillaga!)

mitoRibo
  • 4,468
  • 1
  • 13
  • 22
  • I think this would get me there eventually, but @juanpa.arrivillaga has the parsimonious solution. – horcle_buzz Aug 19 '16 at 02:50
  • I am trying now to embed the "master" key (viz., 0, 1, 2, ..., 65) in the grouped output as another key/value pair for use in uniquely identifying each sep_key. Not sure if yours or @juanpa.arrivillaga is easier to deal with doing this. – horcle_buzz Aug 19 '16 at 17:05