-1

I have a python script that uses the Get API to give me two json arrays

data1

{'result': [
    {'number': '0010041', 'month': 'January'}, 
    {'number': '0010042', 'month': 'March'}
    ]}

data2

{'result': [
    {'task': '0010041', 'time_left': '20 sec'}, 
    {'task': '0010042', 'time_left': '6 min'}
    ]}

I want to merge the two using the common entry so in this case 'number' and 'task' where they are the same merge the rest of the data in the array.

eg.

'number': '0010041', 'month': 'January', 'time_left': '20 sec'
'number': '0010042', 'month': 'March', 'time_left': '6 min'

How?

martineau
  • 119,623
  • 25
  • 170
  • 301
Tim Hannah
  • 172
  • 2
  • 4
  • 13

4 Answers4

2

Here's a way using pandas library:

import pandas as pd
from pandas.io.json import json_normalize

d1 = json_normalize(d1['result'])
d2 = json_normalize(d2['result'])

# merge the data
lst = d1.merge(d2, left_on='number', right_on='task').drop('task', axis=1)

# converting time to same units (seconds)
lst['time_secs'] = lst['time_left'].str.split().apply(lambda x: int(x[0])*60 if 'min' in x else int(x[0]))

# sort, select cols and convert to dictionary
lst = lst.sort_values('time_secs', ascending=True)[['number','month','time_left']].to_dict(orient='records')

[{'number': '0010041', 'month': 'January', 'time_left': '20 sec'},
 {'number': '0010042', 'month': 'March', 'time_left': '6 min'}]
YOLO
  • 20,181
  • 5
  • 20
  • 40
  • This works wonders, one last bit I forgot to mention. how can then sort it by time_left, lowest first? – Tim Hannah Jan 20 '20 at 22:38
  • check edit please, I assumed the time is always in `min` or `seconds`. if it's in hours, you need to convert it into seconds. – YOLO Jan 21 '20 at 06:33
  • So hypothetically its in hours... how would i do that? im happy for hours to be in minutes so 1 hour and 5 mins = 65 mins. Final question, promise, – Tim Hannah Jan 21 '20 at 10:48
  • Don't worry, I have added that condition to the API call to sort the data on Get. Thanks for your help. – Tim Hannah Jan 21 '20 at 11:44
1

This should do the job:

  d1 = data1['result']
  d2 = data2['result']
  merged_list = []
  for item1 in d1:
    for item2 in d2:
      if item1['number'] == item2['task']:
        merged_dict = {**item1, **item2}
        del(merged_dict['task'])
        merged_list.append(merged_dict)

  merged = {
    'result': merged_list
  }
Amir Kirsh
  • 12,564
  • 41
  • 74
  • This works wonders, one last bit I forgot to mention. how can then sort it by time_left, lowest first? – Tim Hannah Jan 20 '20 at 23:10
  • @TimHannah Note that the 'time_left' in your example is in different time units and is a string and not a number. '20 sec' should probably come before '6 min', but '20 min' should come after '6 min', right? I think this requirement deserves its own question. Which may lead to something like [this](https://stackoverflow.com/questions/35545091/convert-human-readable-time-difference-not-timestamp-to-something-usable-for-s) – Amir Kirsh Jan 21 '20 at 07:48
0

if the result list is always sorted, the following example can help:

results = []
for i in range(len(data1['results'])):
   results.append({**data1['results'][i], **data1['results'][i]})
YFrog
  • 109
  • 6
  • The results are not always sorted the same. However the information about '0010041' from data1 and data2 are about the same element so I wanted to use that and match up and merge. – Tim Hannah Jan 20 '20 at 21:25
0

You could approach this elementwise:

results = []
for i in range(len(data1['result'])):
    results.append({
        'number': data1[i]['number'],
        'month': data1[i]['month'],
        'time_left': data2[i]['time_left']
    })
jimsu2012
  • 170
  • 3