0

I have a json in the following format. My requirement is to change the data if the "id" field is same then rest of the field should be made into a list. I tried looping it and referring other sample code but I couldn't get the required result. If the "id" is same then I should combine the rest of the field's value into a list and keeping the key as same. I tired to add values to new dictionary based on 'id' field but result was either last value or some thing like this

[  
    {  
        "time":" all dates ",
        "author_id":"alll ",
        "id_number":"all id_number",
        "id":"all idd"
    }
]

Received JSON :

data = [  
    {  
        "time":"2015/03/27",
        "author_id":"abc_123",
        "id":"4585",
        "id_number":123
    },
    {  
        "time":"2015/03/30",
        "author_id":"abc_123",
        "id":"7776",
        "id_number":122
    },
    {  
        "time":"2015/03/22",
        "author_id":"abc_123",
        "id":"8449",
        "id_number":111
    },
    {  
        "time":"2012/03/30",
        "author_id":"def_456",
        "id":"4585",
        "id_number":90
    }
]

Required Output:

new_data = [
    {
        "time":[
            "2015/03/27",
            "2012/03/30"
        ],
        "author_id":[
            "abc_123",
            "def_456"
        ],
        "id":"4585",
        "id_number":[
            123,
            90
        ]
    },
    {
        "time":"2015/03/30",
        "author_id":"abc_123",
        "id":"7776",
        "id_number":122
    },
    {
        "time":"2015/03/27 05:22:42",
        "author_id":"abc_123",
        "id":"8449",
        "id_number":111
    }
]
  • I tried using the following code [link]http://stackoverflow.com/questions/2365921/merging-python-dictionaries . I tried something like: output = {k:[d.get(k) for d in data] for k in {k for d in data for k in d}} and I didn't get the required output. – still_learning Aug 18 '15 at 15:00
  • So what did you get and what is the difference from what you want – Vincent Beltman Aug 18 '15 at 15:17
  • I will get something like this , {'time': ['2015/03/27', '2015/03/30', '2015/03/30'], 'author_id': ['abc_123', 'abc_123', 'def_456'],'id': ['123', '122', '111', '90',], 'id_number': ['4585', '7776', '4585', '8449',]} . But I want a json which is based on 'id'. If the id's are same then the corresponding json values should be added together – still_learning Aug 18 '15 at 17:59
  • 1
    Please post your code. – saulspatz Aug 18 '15 at 18:36

1 Answers1

0

First step could be to create a more regular structure by mapping ids to dictionaries where all key are mapped to lists of the corresponding values and merge the original dictionaries with the same id value.

Then in a second step create the result list by taking the values of the id to merged dictionaries mapping and decide on the length of the values list to just copy the dictionary over or taking the only element out of the values while copying. And that's it.

#!/usr/bin/env python
# coding: utf8
from __future__ import absolute_import, division, print_function
from collections import defaultdict
from functools import partial
from pprint import pprint


def main():
    records = [
        {
            'time': '2015/03/27',
            'author_id': 'abc_123',
            'id': '4585',
            'id_number': 123
        },
        {
            'time': '2015/03/30',
            'author_id': 'abc_123',
            'id': '7776',
            'id_number': 122
        },
        {
            'time': '2015/03/22',
            'author_id': 'abc_123',
            'id': '8449',
            'id_number': 111
        },
        {
            'time': '2012/03/30',
            'author_id': 'def_456',
            'id': '4585',
            'id_number': 90
        }
    ]

    id2record = defaultdict(partial(defaultdict, list))
    for record in records:
        merged_record = id2record[record['id']]
        for key, value in record.iteritems():
            merged_record[key].append(value)

    result = list()
    for record in id2record.itervalues():
        if len(record['id']) == 1:
            result.append(dict((k, vs[0]) for k, vs in record.iteritems()))
        else:
            record['id'] = record['id'][0]
            result.append(dict(record))

    pprint(result)


if __name__ == '__main__':
    main()

If you can change the requirements for the output I would suggest getting rid of the irregularity in the values. Code for processing the result has to deal with both cases — single values and list/array with values — which just makes it a little more complicated than it has to be.

Update: Fixed a problem in the code. The id value should always be a single value and never a list.

BlackJack
  • 4,476
  • 1
  • 20
  • 25
  • This code provides me the required output. Can you please explain what you mean by "irregularity in the value" ? I have two python scripts which returns two different values and they only have "id" in common. What do you suggest me to do in order to get the required output? – still_learning Aug 24 '15 at 13:31
  • Meanwhile, I wrote a function like this to get the required output. newDicNotUnique = [] #Removing multiple repeated values from list for eachElement in data : if eachElement not in newDicNotUnique: newDicNotUnique.append(eachElement) #updating the dictonary so based on 'number' field. new_data = {eachElement['number']:eachElement for eachElement in newDicNotUnique}.values() #print json.dumps(finalDic) testTrackDefectDetailss =json.dumps(finalDic) – still_learning Aug 24 '15 at 13:35
  • @still_learning Your required output has a list of dictionaries with two different kind of structures which requires code that processes this data to discriminate between values that are simple strings and values that are list of strings. More uniform structure would mean simpler code that doesn't have to deal with special cases. – BlackJack Aug 24 '15 at 16:29
  • Thank you for pointing it out . So the above code returns something like this 'id': ['4585', '4585'] . I just want to know is it possible to achieve what I am trying to? (instead of 'id': ['4585', '4585'] just 'id': '4585'. – still_learning Aug 25 '15 at 16:52