3

i have a column of 50983 rows . each row has a list inside which there are two or more dictionaries. i want to make all dicitionaries in a single dictionary. i want to update this id in each dicitionaries. i used :

l=[{'id':'abc12vr'},{'createdAt': '2018-12-18T16:09:57.098Z',
  'notes': 'Candidate initial submission.',
  'createdBy': 'Steven Klinger'},
 {'createdAt': '2018-12-18T23:14:09.415Z',
  'notes': 'The Candidate Status has now been updated from <strong>CV Submitted</strong> and <strong>Feedback Pending</strong> to <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong>',
  'createdBy': 'Matt'},
 {'createdAt': '2019-01-22T16:04:46.958Z',
  'notes': 'The Candidate Status has now been updated from <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong> to <strong>Client CV Review</strong> and <strong>Position on Hold</strong>',
  'createdBy': 'Matt'},
 {'createdAt': '2018-12-18T16:09:57.098Z',
  'notes': 'Candidate initial submission.',
  'createdBy': 'Steven Klinger'},
 {'createdAt': '2018-12-18T23:14:09.415Z',
  'notes': 'The Candidate Status has now been updated from <strong>CV Submitted</strong> and <strong>Feedback Pending</strong> to <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong>',
  'createdBy': 'Matt'},
 {'createdAt': '2019-01-22T16:04:46.958Z',
  'notes': 'The Candidate Status has now been updated from <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong> to <strong>Client CV Review</strong> and <strong>Position on Hold</strong>',
  'createdBy': 'Matt'}]

id_dict = [d for d in l if 'id' in d][0]
merge = [{**d,**id_dict} for d in l if 'id' not in d]

But i am getting only last row with a single dictionary, i wanted each row

Atir
  • 75
  • 6
  • 1
    How it should looks? In your list you have dictionaries with the same keys. How you can put it into one dictionaries? – Alex Nov 12 '19 at 15:53

4 Answers4

1

This is my debut answer in stackflow and hope it may help you!

You get only last row with a single dictionary, i wanted each row - because dictionary must have an unique key and since all the keys in dictionarys are same that's where python kept overwriting the keys.

Below code does will merge all the dictionary into one and it append key's with a counter value to make keys unique.

merged_dict={}
counter=0
def merge_logic(dict_para):
    #print dict_val
    global counter
    for key,value in dict_para.items():    
        merged_dict[key+"_"+str(counter)]=value
        counter+=1
id_dict = [merge_logic(d) for d in l if isinstance(d,dict)]

print merged_dict

Output:

    {'createdAt_11': '2018-12-18T16:09:57.098Z', 
'notes_0': 'Candidate initial submission.', 
'notes_3': 'The Candidate Status has now been updated from <strong>CV Submitted</strong> and <strong>Feedback Pending</strong> to <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong>', 
'createdAt_14': '2018-12-18T23:14:09.415Z', 
'createdAt_17': '2019-01-22T16:04:46.958Z', 
'notes_6': 'The Candidate Status has now been updated from <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong> to <strong>Client CV Review</strong> and <strong>Position on Hold</strong>', 
'notes_9': 'Candidate initial submission.', 
'createdBy_13': 'Matt', 
'notes_12': 'The Candidate Status has now been updated from <strong>CV Submitted</strong> and <strong>Feedback Pending</strong> to <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong>', 
'createdAt_5': '2018-12-18T23:14:09.415Z', 
'notes_15': 'The Candidate Status has now been updated from <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong> to <strong>Client CV Review</strong> and <strong>Position on Hold</strong>', 
'createdAt_2': '2018-12-18T16:09:57.098Z', 
'createdBy_4': 'Matt', 
'createdBy_7': 'Matt', 
'createdBy_1': 'Steven Klinger', 
'createdAt_8': '2019-01-22T16:04:46.958Z', 
'createdBy_10': 'Steven Klinger', 
'createdBy_16': 'Matt'}

Hope this helps!

Emmanuel-Lin
  • 1,848
  • 1
  • 16
  • 31
0

Seems like this answer should help (not sure though since you didn't provide the desired output):

d = {}
for i in l:
    for k in i.keys():
        d[k] = list(d[k] for d in l)

{'createdAt': ['2018-12-18T16:09:57.098Z', '2018-12-18T23:14:09.415Z', '2019-01-22T16:04:46.958Z', '2018-12-18T16:09:57.098Z', '2018-12-18T23:14:09.415Z', '2019-01-22T16:04:46.958Z'], 'notes': ['Candidate initial submission.', 'The Candidate Status has now been updated from <strong>CV Submitted</strong> and <strong>Feedback Pending</strong> to <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong>', 'The Candidate Status has now been updated from <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong> to <strong>Client CV Review</strong> and <strong>Position on Hold</strong>', 'Candidate initial submission.', 'The Candidate Status has now been updated from <strong>CV Submitted</strong> and <strong>Feedback Pending</strong> to <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong>', 'The Candidate Status has now been updated from <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong> to <strong>Client CV Review</strong> and <strong>Position on Hold</strong>'], 'createdBy': ['Steven Klinger', 'Matt', 'Matt', 'Steven Klinger', 'Matt', 'Matt']}
help-ukraine-now
  • 3,850
  • 4
  • 19
  • 36
  • but there is no id, oh i forgot to tell u that there is id in another column that needs to be inside this dictionary – Atir Nov 12 '19 at 16:07
  • @AamerAshfaque you should update your question in that case – help-ukraine-now Nov 12 '19 at 16:11
  • @politicalscientist this is really ineficient. for each dictionary, you pass over the entire list again. It's at least O(n^2) and is gonna be bad for an moderately large list size – Brian Nov 12 '19 at 18:36
  • @political scientist , please check now – Atir Nov 13 '19 at 08:42
0

This makes one pass over the data:

from collections import defaultdict

output_dict = defaultdict(list)

for d in l:
    for key in d:
        output_dict[key].append(d[key])

>>> output

defaultdict(list,
            {'createdAt': ['2018-12-18T16:09:57.098Z',
              '2018-12-18T23:14:09.415Z',
              '2019-01-22T16:04:46.958Z',
              '2018-12-18T16:09:57.098Z',
              '2018-12-18T23:14:09.415Z',
              '2019-01-22T16:04:46.958Z'],
             'notes': ['Candidate initial submission.',
              'The Candidate Status has now been updated from <strong>CV Submitted</strong> and <strong>Feedback Pending</strong> to <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong>',
              'The Candidate Status has now been updated from <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong> to <strong>Client CV Review</strong> and <strong>Position on Hold</strong>',
              'Candidate initial submission.',
              'The Candidate Status has now been updated from <strong>CV Submitted</strong> and <strong>Feedback Pending</strong> to <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong>',
              'The Candidate Status has now been updated from <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong> to <strong>Client CV Review</strong> and <strong>Position on Hold</strong>'],
             'createdBy': ['Steven Klinger',
              'Matt',
              'Matt',
              'Steven Klinger',
              'Matt',
              'Matt']})
Brian
  • 1,572
  • 9
  • 18
0

Original Answer

I have assumed you need a key and all values for that key to be appended in a list. Here I have used setdefault method of dictionary to achieve it.

# Input
l=[{'createdAt': '2018-12-18T16:09:57.098Z',
  'notes': 'Candidate initial submission.',
  'createdBy': 'Steven Klinger'},
 {'createdAt': '2018-12-18T23:14:09.415Z',
  'notes': 'The Candidate Status has now been updated from <strong>CV Submitted</strong> and <strong>Feedback Pending</strong> to <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong>',
  'createdBy': 'Matt'},
 {'createdAt': '2019-01-22T16:04:46.958Z',
  'notes': 'The Candidate Status has now been updated from <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong> to <strong>Client CV Review</strong> and <strong>Position on Hold</strong>',
  'createdBy': 'Matt'},
 {'createdAt': '2018-12-18T16:09:57.098Z',
  'notes': 'Candidate initial submission.',
  'createdBy': 'Steven Klinger'},
 {'createdAt': '2018-12-18T23:14:09.415Z',
  'notes': 'The Candidate Status has now been updated from <strong>CV Submitted</strong> and <strong>Feedback Pending</strong> to <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong>',
  'createdBy': 'Matt'},
 {'createdAt': '2019-01-22T16:04:46.958Z',
  'notes': 'The Candidate Status has now been updated from <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong> to <strong>Client CV Review</strong> and <strong>Position on Hold</strong>',
  'createdBy': 'Matt'}]

# Main code
res = {} # defined output dict
for i in l: # for loop to fetch each element(dict) inside a list
    for k, v in i.items(): # to fetch key value fair of each dict
        res.setdefault(k, []).append(v) # setdefault method of add key to result and created an empty list and appended value to it.  
print (res) # print result

# Output
# {'createdAt': ['2018-12-18T16:09:57.098Z', '2018-12-18T23:14:09.415Z', '2019-01-22T16:04:46.958Z', '2018-12-18T16:09:57.098Z', '2018-12-18T23:14:09.415Z', '2019-01-22T16:04:46.958Z'], 'notes': ['Candidate initial submission.', 'The Candidate Status has now been updated from <strong>CV Submitted</strong> and <strong>Feedback Pending</strong> to <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong>', 'The Candidate Status has now been updated from <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong> to <strong>Client CV Review</strong> and <strong>Position on Hold</strong>', 'Candidate initial submission.', 'The Candidate Status has now been updated from <strong>CV Submitted</strong> and <strong>Feedback Pending</strong> to <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong>', 'The Candidate Status has now been updated from <strong>Client CV Review</strong> and <strong>Feedback Awaiting</strong> to <strong>Client CV Review</strong> and <strong>Position on Hold</strong>'], 'createdBy': ['Steven Klinger', 'Matt', 'Matt', 'Steven Klinger', 'Matt', 'Matt']}

Modified Answer

# NOTE: "l" is individual list of the your data set.
value_for_id = "abc" # Value to be set for id
for i in l: # For each element in l - where l is your individual list
    if i.get("id",None) is not None: # verify if dict with key -> "id" exist
        i["id"] = value_for_id # If exist then update the value for key -> "id"
        break # break and come out of the for loop
else: # if there is no break, i.e. data doesn't have dict with "id" then we will append a new dict to the list. 
    l.append({"id":value_for_id}) # Appending new dict to the list

print (l)

I hope this helps and counts!

Akash Swain
  • 520
  • 3
  • 13
  • Strange!, are you sure you are using the same Input data as asked here. Your question has only one list assigned to `l`.If your input data has multiple list then the above error can occur, in this case use one more for loop above `for i in l:` – Akash Swain Nov 13 '19 at 09:15
  • i just needed to update the dicitonaries with 'id' : value dictionary that is in the same list. yes i hv 50983 lists in each row. i just posted the first row of my column – Atir Nov 13 '19 at 09:31
  • Ok, so you have updated the question. Based on your updated question I have edited my answer. Please go through it and all other answers. If your query is resolved, I request you to close this Question. – Akash Swain Nov 13 '19 at 09:48