2

please have multiple JSON files, which contain URLs to images. There are three formats for each image:

  1. SD version - standard quality ("isThumbNail": false, "isHdImage": false)
  2. thumbnail - lowest quality ("isThumbNail": true, "isHdImage": false)
  3. HD - highest quality ("isThumbNail": false, "isHdImage": true)

It looks like this:

{
  "objectId": 1234,
  "imgCount": 3,
  "dataImages": [
    {
      "sequence": 1,
      "link": [
        {
          "url": "http://example.com/SD/image_1.jpg",
          "isThumbNail": false,
          "isHdImage": false
        },
        {
          "url": "http://example.com/THUMB/image_1.jpg",
          "isThumbNail": true,
          "isHdImage": false
        },
        {
          "url": "http://example.com/HD/image_1.jpg",
          "isThumbNail": false,
          "isHdImage": true
        }
      ]
    },
    {
      "sequence": 2,
      "link": [
        {
          "url": "http://example.com/SD/image_2.jpg",
          "isThumbNail": false,
          "isHdImage": false
        },
        {
          "url": "http://example.com/THUMB/image_2.jpg",
          "isThumbNail": true,
          "isHdImage": false
        },
        {
          "url": "http://example.com/HD/image_2.jpg",
          "isThumbNail": false,
          "isHdImage": true
        }
      ]
    },
    {
      "sequence": 3,
      "link": [
        {
          "url": "http://example.com/SD/image_3.jpg",
          "isThumbNail": false,
          "isHdImage": false
        },
        {
          "url": "http://example.com/THUMB/image_3.jpg",
          "isThumbNail": true,
          "isHdImage": false
        },
        {
          "url": "http://example.com/HD/image_3.jpg",
          "isThumbNail": false,
          "isHdImage": true
        }
      ]
    }
  ]
}

I am trying to get an HD version of the image's URL and append it to images list. It may happen, that there is no HD version of the image, so if it's not present in JSON, I want to download an SD version. And of course, it may also happen, that there will be only a thumbnail version of the image or no image at all - so it should return some empty value, or something safe, that will not break the program.

With this code, I am able to get all isHdImage:

def get_images(url):
    try:
        images = []
        response = requests.get(url)
        response.raise_for_status()
        data = response.json()

        for sequence in data['lotImages']:
            for link in sequence['link']:
                if link['isHdImage'] is True:
                    images.append(['url'])
                    
        return images

    except requests.exceptions.HTTPError as err:
        print('HTTPError:', err)

But I am not sure, how I can reach a solution, which I have described above. Thank you for any advice.

saby
  • 351
  • 1
  • 2
  • 18

1 Answers1

1

you could check for every image the combinations, if I understand correctly: if the flags isHdImage and isThumbNail are both false the image is a SD version, if only isHdImage is true then it's and HD, if only isThumbNail is true then it's a thumbnail, so you can do something like this:

def get_images(url):
    resolution_order = ['HD', 'SD', 'TH'] #the less index the better
    try:
        images = []
        #get the last item in the resolution_order list to
        #get the worse resolution, so it will always get the
        #better resolution as soon as it finds out one
        best = [resolution_order[-1], ""]
        response = requests.get(url)
        response.raise_for_status()
        data = response.json()

        for sequence in data['lotImages']:
            for link in sequence['link']:

                if link['isHdImage'] and not link['isThumbNail']:
                    #it's an HD image
                    if resolution_order.index('HD') < resolution_order.index(best[0]):
                        best = ['HD', link['url']]

                elif not link['isHdImage'] and link['isThumbNail']:
                    #it's a thumbnail
                    if resolution_order.index('TH') < resolution_order.index(best[0]):
                        best = ['TH', link['url']]

                elif not link['isHdImage'] and not link['isThumbNail']:
                    #it's a SD image
                    if resolution_order.index('SD') < resolution_order.index(best[0]):
                        best = ['SD', link['url']]

            images.append(best[1]) #append the best url to the images
                    
        return images

    except requests.exceptions.HTTPError as err:
        print('HTTPError:', err)

Explanation: we cycle into every link of every sequence in the json, then we assign to the best array at position 0 the resolution, and at the position 1 the corresponding link. We have the resolution_order array that indicates the order of the resolution to download from.

For example, if the script first find an SD image it will assign to best the value ['SD', 'URL'], the index of 'SD' in resolution_order is 1, then it finds and HD image, then when it checks for resolution_order.index('HD') < resolution_order.index(best[0]) it will return True only if the index of the value HD in resolution _order is less than the resolution of the position 0 in the best array, that in this case is (as said before) 1, the value of the index of 'HD' is 0 then best is replaced with ['HD', 'NEW HD URL'], so even if the resolution are not ordered you can still have the best quality based on the resolution_oder

Ax34
  • 123
  • 7
  • Hello, yes, you understand correctly. I am really glad for your advice and help. It makes sense and I think that this solution is really great. One more thing - I have tried this and I don't know why, but I am getting: ```IndexError: list index out of range``` on this line: ```if resolution_order.index('SD') < resolution_order.index(best[0]):``` – saby Jan 04 '21 at 11:59
  • @Desttro I'm sorry, noticed right now, in the declaration of 'best' you have to set it to the worse quality, the error occurs because the first time it tries to check for the best quality it tries to check the index 0 of an empty list, throwing an error, I'll fix the answer right now. – Ax34 Jan 04 '21 at 12:12
  • I have just found, that it appends the same image again so in a list, there is the same image URL multiple times, so it returns: ```['http://example.com/hd_image_1.jpg', 'http://example.com/hd_image_1.jpg', 'http://example.com/hd_image_1.jpg']``` but it should return image_1.jpg, image_2.jpg, image_3.jpg, ... – saby Jan 05 '21 at 12:32
  • if in the json there are multiple sequences with the same images it's another problem, you could simply check for duplicates in the list an remove the ones that exceeds, otherwise it should put in the list only 1 link for every sequence, [this](https://stackoverflow.com/questions/7961363/removing-duplicates-in-the-lists) can help – Ax34 Jan 05 '21 at 14:57
  • The result is different - returned variable ~~~image~~~ has a list of first HD url from my example data - it returned only this URL ```http://example.com/HD/image_1.jpg``` and no other. It seems, that it is checking only the first "link" in JSON and adds to list - so it ignores others. I will try to figure out, how to solve that. – saby Jan 05 '21 at 15:42
  • I have figured out - definition of ```best = [resolution_order[-1], ""]``` must be after first for definition (not before) ```for sequence in data['lotImages']:``` - after that change, it returns correct values. So it is solved by this, accepting answer, thanks! – saby Jan 05 '21 at 23:32