0

enter image description here

I have a json string that looks like something like this:

[
  {
    'DigitalAsset': {
      'URL': **SOMEIMAGEURL**
      'DigitalAssetType': {
        'DigitalAssetTypeName': 'Building Image'
      },
      'URLType': {
        'URLTypeName': 'Secure URL'
      }
    },
    'PropertyDigitalAssetID': 18283747,
  },

I want to extract the image URL in pandas and have written the following code to achieve that

 df['url'] = df['PropertyDigitalAsset'].apply(lambda x: next(iter([a[
                                                                        'DigitalAsset'] for a in x if
                             a['DigitalAssetType']['DigitalAssetTypeName'] == 'Building Image']),
                       None)
        if type(x) == list else None)

But I am not sure what goes wrong that it doesn't work, is there a better way of doing this?

  • 2
    Welcome to Stackoverflow. Please take a moment to read [how-to-make-good-reproducible-pandas-examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – Shubham Sharma Jun 18 '20 at 11:23

1 Answers1

0

I was able to extract the URL's using my code below. Note that I had to make some assumptions on if the list could contain more than one JSON (hence more than one URL).

You can safely ignore the lines that setup the dataframe, since you should already have yours made, I needed them to verify the results.

import pandas as pd

def geturls(image_list):
    URL_List = []
    for image in image_list:
        if image['DigitalAsset']['DigitalAssetType']['DigitalAssetTypeName'] == 'Building Image':
            URL_List.append(image['DigitalAsset']['URL'])

    return URL_List

Json_List_A = [
  {'DigitalAsset': {
    'URL': 'Random URL 1',
    'DigitalAssetType': {
        'DigitalAssetTypeName': 'Building Image'
      },
     'URLType': {
        'URLTypeName': 'Secure URL'
      }
    },
    'PropertyDigitalAssetID': 18283747,
  },
  {'DigitalAsset': {
    'URL': 'Random URL 2',
    'DigitalAssetType': {
        'DigitalAssetTypeName': 'Not a Building'
      },
     'URLType': {
        'URLTypeName': 'Secure URL'
      }
    },
    'PropertyDigitalAssetID': 13747,
  }
]


Json_List_B = [
  {
    'DigitalAsset': {
      'URL': 'Random URL A',
      'DigitalAssetType': {
        'DigitalAssetTypeName': 'Building Image'
      },
      'URLType': {
        'URLTypeName': 'Secure URL'
      }
    },
    'PropertyDigitalAssetID': 82747,
  },
    {
    'DigitalAsset': {
      'URL': 'Random URL B',
      'DigitalAssetType': {
        'DigitalAssetTypeName': 'Building Image'
      },
      'URLType': {
        'URLTypeName': 'Secure URL'
      }
    },
    'PropertyDigitalAssetID': 1827,
  }
  ]


data = {'PropertyID':  ['762', '771'],
        'PropertyDigitalAsset': [Json_List_A, Json_List_B],
        }

df = pd.DataFrame (data, columns = ['PropertyID','PropertyDigitalAsset'])

df['url'] = df['PropertyDigitalAsset'].apply(lambda x: geturls(x))

print(df)

The initial few lines are all setup to show that it works with sample data. (if your data happens to be different than what I have assumed leave a comment and I can alter the functions as needed)

I created a helper function to extract the URLs from each row and let the pandas apply function do the rest.

Karan Shishoo
  • 2,402
  • 2
  • 17
  • 32