3

I have two lists of dictionaries in the format:

systolic_sex = [
        {'attribute': u'bp', 'value_d': 133.0, 'value_s': u'133', 'sid': 6}, 
        {'attribute': u'bp', 'value_d': 127.0, 'value_s': u'127', 'sid': 17}, 
        {'attribute': u'bp', 'value_d': 121.0, 'value_s': u'121', 'sid': 18}, 
        {'attribute': u'bp', 'value_d': 127.0, 'value_s': u'127', 'sid': 27}, 
        {'attribute': u'bp', 'value_d': 120.0, 'value_s': u'120', 'sid': 42},
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 6},      
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 17},   
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 18},
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 27},   
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 42}
    ]



sex = [
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 6},      
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 17},   
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 42}
    ]

I want to match these lists by the value of the key 'sid,' so that if the same value of 'sid' is in both, I have a match, otherwise, I do not. If I have a match, I then append the matching dictionaries by 'sid' from both sets to a new list accordingly like so

new_set = [
        {'attribute': u'bp', 'value_d': 133.0, 'value_s': u'133', 'sid': 6}, 
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 6},
        {'attribute': u'bp', 'value_d': 127.0, 'value_s': u'127', 'sid': 17}, 
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 17},
        {'attribute': u'bp', 'value_d': 120.0, 'value_s': u'120', 'sid': 42},
        {'attribute': u'SEX', 'value_d': 0.0, 'value_s': u'M', 'sid': 42}
    ]

I've tried various methods of intersecting these, including modifying answers from Match set of dictionaries, but I am looking to create a new list of dictionaries that have the matching sids, not replacing values between the two lists.

Community
  • 1
  • 1
horcle_buzz
  • 2,101
  • 3
  • 30
  • 59
  • Can you show what the desired output should look like in this case? Sounds like you effectively want to join the two "tables" on the `sid` "column"? – Amit Kumar Gupta Oct 25 '15 at 03:32
  • 2
    LOL the output was nothing like I imagined it for some reason. Deleted my answer. – woot Oct 25 '15 at 03:46
  • It was a nice remapping of the data! Useful for future efforts. – horcle_buzz Oct 25 '15 at 04:05
  • NB: The set sex could be of the form systolic_sex, that is, having more than one set of different values for the 'attribute' key across a single value of the 'sid' key. For example, I could have two sets, systolic_sex and diastolic_sex, and then do the match of all keys by 'sid'. Sorry if this was not clear in the original post. In any case, @dawg's solution worked for this, even though I did not make this requirement clear. – horcle_buzz Oct 25 '15 at 16:12

4 Answers4

3

You may be interested in using pandas if you're dealing with data like this a lot. Your dictionaries are already in the form pandas likes, so you can do this:

import pandas

systolic_sex = pandas.DataFrame(systolic_sex)
sex = pandas.DataFrame(sex)

matches = systolic_sex[systolic_sex.sid.isin(sex.sid)]

If you want the data back in the same format as you supplied them, you can to

output = matches.to_dict(orient='records')
chthonicdaemon
  • 19,180
  • 2
  • 52
  • 66
  • I've actually used Pandas for manipulating data and would have had I not been able to use a python only approach. I actually like this solution much better than the one I accepted as an answer, since it is more parsimonious, minus the extra import. – horcle_buzz Oct 25 '15 at 15:07
  • 1
    Yeah, I find that Pandas is becoming quite standard wherever I go, so I don't think it's much overhead for the vastly better handling of data like these. – chthonicdaemon Oct 25 '15 at 15:12
  • For what I had to do with it, it go a bit complicated. I will give it another chance. – horcle_buzz Oct 25 '15 at 15:17
  • Last step to convert DataFrame back to list of dictionaries is `matches.T.to_dict().values()` ... nice (you may have convinced me, especially since I have an upcoming project with a LOT of similar hoops to jump through)! – horcle_buzz Oct 25 '15 at 15:28
  • 1
    Oh, and I must comment that you are the rare breed of coder/data person I've seen actually use the correct subject-verb agreement for data! (**data are/datum is**) – horcle_buzz Oct 25 '15 at 15:33
  • Thanks. I'm a lecturer and pedantry goes with the territory. Note that if your data started life in json files (which it appears like) you can read directly into a dataframe with `pandas.read_json`, and the final step to convert is nicer as `matches.to_dict(orient='records')` – chthonicdaemon Oct 25 '15 at 15:38
  • Pedantry is great! Probably confounds many people, especially if you correct them, like I always do! (^_^) Thanks for the tip. I will give it a try. – horcle_buzz Oct 25 '15 at 15:45
1

Going off the answer in the post you linked:

systolic_sex = dict((e['sid'], e) for e in systolic_sex)
sex = set(e['sid'] for e in sex)

matches = []
for sid,v in systolic_sex.items():
    if sid not in sex: continue
    matches.append(v)
inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241
  • I tried this based previous to edits, and it worked for the stated problem. However, I realized my first set should have been combined as systolic_sex. I have edited the original post accordingly. – horcle_buzz Oct 25 '15 at 03:47
  • Got it, partially, but I see where to go with it. Thanks for getting me back on track, and for teaching some new python tricks! (^_^) – horcle_buzz Oct 25 '15 at 04:04
1
>>> uniq=set(e['sid'] for e in sex) 
>>> filter(lambda d: d['sid'] in uniq, systolic_sex)
[{'attribute': u'bp', 'sid': 6L, 'value_s': u'133', 'value_d': 133.0},        
 {'attribute': u'bp', 'sid': 17L, 'value_s': u'127', 'value_d': 127.0},  
 {'attribute': u'bp', 'sid': 42L, 'value_s': u'120', 'value_d': 120.0}, 
 {'attribute': u'SEX', 'sid': 6L, 'value_s': u'M', 'value_d': 0.0}, 
 {'attribute': u'SEX', 'sid': 17L, 'value_s': u'M', 'value_d': 0.0}, 
 {'attribute': u'SEX', 'sid': 42L, 'value_s': u'M', 'value_d': 0.0}]
dawg
  • 98,345
  • 23
  • 131
  • 206
  • 1
    This is one of those cases where a list comprehension is both shorter and prettier: `[d for d in systolic_sex if d['sid'] in uniq]` – chthonicdaemon Oct 25 '15 at 15:14
  • I just tested this on two more generalized data sets than I gave in my example (with arbitrary duplicates on the value of the key 'sid,' but with more than two non-duplicate values on the key 'attribute', and it worked as anticipated). The pandas solution would more than likely need a bit of tweaking, since it gave the same results @inspectorG4dget's solution gave, which required some tweaking for the more general case. – horcle_buzz Oct 25 '15 at 15:42
  • To get to the general case, pandas was he easiest of the suggestions to use. – horcle_buzz Oct 26 '15 at 14:41
1

I ended up using the following (as per @chtohnicdaemon):

import pandas
#-----> code snipped here
#----->
# iterate over record sets returned by SQLAlchemy to populate list
    for result in query_right:
        data = {'sid': result.patient_sid,
                'value_s': result.string_value,
                'value_d': result.double_value,
                'attribute': result.attribute_value}

                result_right.append(data)

    for result in left_child:
        data = {'sid': result.patient_sid,
                'value_s': result.string_value,
                'value_d': result.double_value,
                'attribute': result.attribute_value}

                result_left.append(data)

# convert list of dictionaries to data frames
right = pandas.DataFrame(right_result)
left = pandas.DataFrame(left_result)

# get matches
matches_right  = right[right.sid.isin(left.sid)]
matches_left  = left[left.sid.isin(right.sid)]

# combine matched sets into single set
frames = [matches_right,matches_left]

# concatenate data, drop duplicates and convert back to a list of dictionaries
result = pd.concat(frames).drop_duplicates().to_dict(orient='records')

Worked like a charm!

horcle_buzz
  • 2,101
  • 3
  • 30
  • 59
  • 1
    `drop_duplicates` doesn't work in-place, so that line isn't actually doing anything here. You also don't need the extra call to `pandas.Dataframe` in the "get matches" area. You can just do `pandas.concat(frames).drop_duplicates().to_dict(orient='records')` for all the parts from "concatenate sets" on down. – chthonicdaemon Oct 26 '15 at 17:39
  • drop_duplicates worked from the python interpreter (I have yet to test it in my code). Other than that, I guess the extra data frames are indeed redundant, otherwise I like the single line. Mahalo! – horcle_buzz Oct 26 '15 at 18:27
  • 1
    You would have seen the result, but it is not stored. Few of the pandas functions work in place. – chthonicdaemon Oct 26 '15 at 18:36
  • 1
    I see you're reading from a db. You can do it directly with [pandas.read_sql](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_sql.html) – chthonicdaemon Oct 28 '15 at 14:44
  • Nice! Looks like I can do what I need to along the lines of reading `pandas.read_sql(query_right.statement, query_right.session.bind)` into a data frame... this will shave off a few clock ticks from processing time, especially when the returned record set is large, since I won't have to iterate through it to build the list. – horcle_buzz Oct 29 '15 at 02:18