0

I have two json databases. If there is a new value in the "img_url" (one in the last json that isn't in the other), I want to print the url or place it in a variable. The goal is just to find a list of the new values. Input json:

last_data = [
{
    "objectID": 16240,
    "results": [
        {
            "img_url": "https://img.com/1.jpg"
        },
        {
            "img_url": "https://img.com/2.jpg"
        },
        {
            "img_url": "https://img.com/30.jpg"
        }
    ]
}
{
    "objectID": 16242,
    "results": [
        {
            "img_url": "https://img.com/1.jpg"
        },
        {
            "img_url": "https://img.com/2.jpg"
        },
        {
            "img_url": "https://img.com/3.jpg"
        }
    ]
}]
# ...
#multiple other objectIDs

]

Second input:

second_data =[
{
    "objectID": 16240,
    "results": [
        {
            "img_url": "https://img.com/1.jpg"
        },
        {
            "img_url": "https://img.com/2.jpg"
        }
    ]
},
{
    "objectID": 16242,
    "results": [
        {
            "img_url": "https://img.com/1.jpg"
        },
        {
            "img_url": "https://img.com/2.jpg"
        }
    ]
}...
#multiple other objectIDs

]

And I want to output only the https://img.com/3.jpg and the https://img.com/3.jpg urls (it can be a list because I have multiples objects) or place it in a variable

My code:

#last file
    for item_last in last_data:
        results_last = item_last["results"]
        if results_last is not []:
            for result_last in results_last:
                ccv_last = result_last["img_url"]
    #second file
    for item_second in second_data:
        results_second = item_second["results"]
        if results_second is not []:
        # loop in results
            for result_second in results_second:
                ccv_second = result_second["img_url"]

    if gm_last != gm_second and gm_last is not None:
    print(gm_last)
lf_celine
  • 653
  • 7
  • 19

1 Answers1

1

If you are trying to find difference between two different list here it is. I have slightly modified your same code to get the expected result.

#last file
ccv_last = []
for item_last in last_data:
    results_last = item_last["results"]
    if results_last:
        for result_last in results_last:
            ccv_last.append(result_last["img_url"])
#second file
ccv_second = []
for item_second in second_data:
    results_second = item_second["results"]
    if results_second:
        for result_second in results_second:
            ccv_second.append(result_second["img_url"])

diff_list = list(set(ccv_last)-set(ccv_second)))

Output:

['https://img.com/30.jpg', 'https://img.com/3.jpg']

However you can plan to slightly change your results model for better performance please find below.

If you think no further keys are planned for the dictionaries in result list then probably you just want list. So you can change dict -> list

from

...
"results": [
    {
        "img_url": "https://img.com/1.jpg"
    },
    {
        "img_url": "https://img.com/2.jpg"
    }
]
...

to just list of urls

...
"img_url_results": ["https://img.com/1.jpg","https://img.com/2.jpg"]
...

By doing this change you can just skip one for loop.

#last file
ccv_last = []
for item_last in last_data:
    if item_last.get('img_url_results'):
       ccv_last.extend(item_last["img_url_results"])
Shakeel
  • 1,869
  • 15
  • 23
  • Thank you very much, the problem is that I have duplicates and I want to keep it and it's impossible with set()... How can I change that ? – lf_celine May 04 '20 at 15:20
  • You can use dictionary instead of list, please have a look at this solution it might help you https://stackoverflow.com/a/41808831/9592801 – Shakeel May 04 '20 at 21:01