1

I am downloading some data from BestBuy Products API using python's Requests library and I want to store them into pandas dataframe.

Something like that:

results = requests.get(url1, 
                  params={'paramStuff'}, 
                  headers={'User-Agent': ua})
products = json.loads(results.text)

A get a lot of various fields with service info, thus I aim only for specific field in JSON which I want:

products['products']

I have:

[{'details':[{'name': 'Name of Feature', 'value':'Value Of Feature'},
             {'name': 'Name of Other Feature', 'value':'Value Of Other 
              Feature'}, ...],
   'ProductId': 'Id Of Product 1',
   'Some Other Field': 'Some Other Field Value'},
 {same structure as above for other product}, {etc}]

So as you see it is something like a list of dictionaries which in turn contain lists of dictionaries themselves. To highlight - details dict can have various list of combinations of Name: Value (names are different across products as well).

Any idea on how to approach such structure to get into dataframe with such format:

+-----------+-------------------+-------------------+-------------------+------------------+
| ProductID | Name of Feature 1 | Name of Feature 2 | Name Of Feature 3 | Some Other Field |
+-----------+-------------------+-------------------+-------------------+------------------+
| Product 1 | Value             | NULL              | Value             | Value            |
| Product 2 | NULL              | Value             | Value             | Value            |
+-----------+-------------------+-------------------+-------------------+------------------+

So far I only managed to get to something like this:

+-----------+-----------------------------------------------------------------------------------------------------------------------------------+------------------+
| ProductID |                                                              Details                                                              | Some Other Field |
+-----------+-----------------------------------------------------------------------------------------------------------------------------------+------------------+
| Product 1 | [{'name': 'Name of Feature', 'value':'Value Of Feature'},{'name': 'Name of Other Feature', 'value':'Value Of Other Feature'},...] | Value 1          |
| Product 2 | [{'name': 'Name of Feature', 'value':'Value Of Feature'},{'name': 'Name of Other Feature', 'value':'Value Of Other Feature'},...] | Value 2          |
+-----------+-----------------------------------------------------------------------------------------------------------------------------------+------------------+
Maksim Khaitovich
  • 4,742
  • 7
  • 39
  • 70

1 Answers1

1

Ok, I ended up developing a way to manually parse nested field. Didn't figure out if there is any easy way. FYI this was used to parse the response from BestBuy Products API, in case anyone finds it useful.

#first build the pandas DF shown in question
df = pd.io.json.json_normalize(products)

#fields which are not nested and not require parsing
fields = ['sku', 'name', 'regularPrice', 'manufacturer']
#nested field is called 'details', as mentioned can have a lot of different subfields
featureFields = []


#first build a list which will have all potential features from the nested field
for i in range(0,len(df)):
    row = df.iloc[i]
    for detail in row['details']:
        featureFields.append(detail['name'].split('>', 1)[-1])

#make a list unique
featureFields = set(featureFields)      
fields = set(fields)

#now we go over each record in dataframe and parse nested field to a dict
records = []

for i in range(0,len(df)):
    row = df.iloc[i]
    record = dict.fromkeys(fields)
    record['name'] = row['name']
    record['regularPrice'] = row['regularPrice']
    record['manufacturer'] = row['manufacturer']
    record['sku'] = row['sku']
    for detail in row['details']:
        record[detail['name'].split('>', 1)[-1]] = detail['value'].split('>', 1)[-1]
    records.append(record)

#finally we have not nested list of dictionaries with records
dfFinal = pd.DataFrame.from_dict(records)
Maksim Khaitovich
  • 4,742
  • 7
  • 39
  • 70