1

I'm new in Data Engineering I'm having a hard time trying to unflatten and convert to a CSV file the following JSON. I had tried using Pandas, Python, JQ to do it but it seems like I'm not making any progress. I tried several solutions erased them and try again but I cannot make it work. Can I get any advice on this? I tried exploding the arrays in pandas but it did not work.

{
"recipes": [
    {
        "vegetarian": true,
        "vegan": true,
        "glutenFree": true,
        "dairyFree": true,
        "veryHealthy": false,
        "cheap": false,
        "veryPopular": false,
        "sustainable": false,
        "weightWatcherSmartPoints": 4,
        "gaps": "no",
        "lowFodmap": false,
        "aggregateLikes": 224,
        "spoonacularScore": 55.0,
        "healthScore": 7.0,
        "creditsText": "Full Belly Sisters",
        "license": "CC BY-SA 3.0",
        "sourceName": "Full Belly Sisters",
        "pricePerServing": 138.79,
        "extendedIngredients": [
            {
                "id": 2069,
                "aisle": "Oil, Vinegar, Salad Dressing",
                "image": "balsamic-vinegar.jpg",
                "consistency": "liquid",
                "name": "balsamic vinegar",
                "nameClean": "balsamic vinegar",
                "original": "2T balsamic vinegar",
                "originalString": "2T balsamic vinegar",
                "originalName": "balsamic vinegar",
                "amount": 2.0,
                "unit": "T",
                "meta": [],
                "metaInformation": [],
                "measures": {
                    "us": {
                        "amount": 2.0,
                        "unitShort": "Tbsps",
                        "unitLong": "Tbsps"
                    },
                    "metric": {
                        "amount": 2.0,
                        "unitShort": "Tbsps",
                        "unitLong": "Tbsps"
                    }
                }
            },
            {
                "id": 1034053,
                "aisle": "Oil, Vinegar, Salad Dressing",
                "image": "olive-oil.jpg",
                "consistency": "liquid",
                "name": "extra virgin olive oil",
                "nameClean": "extra virgin olive oil",
                "original": "4T extra virgin olive oil",
                "originalString": "4T extra virgin olive oil",
                "originalName": "extra virgin olive oil",
                "amount": 4.0,
                "unit": "T",
                "meta": [],
                "metaInformation": [],
                "measures": {
                    "us": {
                        "amount": 4.0,
                        "unitShort": "Tbsps",
                        "unitLong": "Tbsps"
                    },
                    "metric": {
                        "amount": 4.0,
                        "unitShort": "Tbsps",
                        "unitLong": "Tbsps"
                    }
                }
            },
            {
                "id": 2044,
                "aisle": "Produce;Spices and Seasonings",
                "image": "basil.jpg",
                "consistency": "solid",
                "name": "fresh basil",
                "nameClean": "basil",
                "original": "1/4c chopped fresh basil",
                "originalString": "1/4c chopped fresh basil",
                "originalName": "chopped fresh basil",
                "amount": 0.25,
                "unit": "c",
                "meta": [
                    "fresh",
                    "chopped"
                ],
                "metaInformation": [
                    "fresh",
                    "chopped"
                ],
                "measures": {
                    "us": {
                        "amount": 0.25,
                        "unitShort": "cups",
                        "unitLong": "cups"
                    },
                    "metric": {
                        "amount": 59.147,
                        "unitShort": "ml",
                        "unitLong": "milliliters"
                    }
                }
            },
            {
                "id": 11215,
                "aisle": "Produce",
                "image": "garlic.png",
                "consistency": "solid",
                "name": "garlic",
                "nameClean": "garlic",
                "original": "3 cloves garlic",
                "originalString": "3 cloves garlic",
                "originalName": "garlic",
                "amount": 3.0,
                "unit": "cloves",
                "meta": [],
                "metaInformation": [],
                "measures": {
                    "us": {
                        "amount": 3.0,
                        "unitShort": "cloves",
                        "unitLong": "cloves"
                    },
                    "metric": {
                        "amount": 3.0,
                        "unitShort": "cloves",
                        "unitLong": "cloves"
                    }
                }
            },
            {
                "id": 10111529,
                "aisle": "Produce",
                "image": "cherry-tomatoes.png",
                "consistency": "solid",
                "name": "grape tomatoes",
                "nameClean": "grape tomato",
                "original": "2-3 cups grape tomatoes, halved",
                "originalString": "2-3 cups grape tomatoes, halved",
                "originalName": "grape tomatoes, halved",
                "amount": 2.0,
                "unit": "cups",
                "meta": [
                    "halved"
                ],
                "metaInformation": [
                    "halved"
                ],
                "measures": {
                    "us": {
                        "amount": 2.0,
                        "unitShort": "cups",
                        "unitLong": "cups"
                    },
                    "metric": {
                        "amount": 473.176,
                        "unitShort": "ml",
                        "unitLong": "milliliters"
                    }
                }
            },
            {
                "id": 11282,
                "aisle": "Produce",
                "image": "brown-onion.png",
                "consistency": "solid",
                "name": "onion",
                "nameClean": "onion",
                "original": "1/2 onion, sliced",
                "originalString": "1/2 onion, sliced",
                "originalName": "onion, sliced",
                "amount": 0.5,
                "unit": "",
                "meta": [
                    "sliced"
                ],
                "metaInformation": [
                    "sliced"
                ],
                "measures": {
                    "us": {
                        "amount": 0.5,
                        "unitShort": "",
                        "unitLong": ""
                    },
                    "metric": {
                        "amount": 0.5,
                        "unitShort": "",
                        "unitLong": ""
                    }
                }
            },
            {
                "id": 1032009,
                "aisle": "Spices and Seasonings",
                "image": "red-pepper-flakes.jpg",
                "consistency": "solid",
                "name": "red pepper flakes",
                "nameClean": "red pepper flakes",
                "original": "couple of pinches of red pepper flakes",
                "originalString": "couple of pinches of red pepper flakes",
                "originalName": "couple of of red pepper flakes",
                "amount": 2.0,
                "unit": "pinches",
                "meta": [
                    "red"
                ],
                "metaInformation": [
                    "red"
                ],
                "measures": {
                    "us": {
                        "amount": 2.0,
                        "unitShort": "pinches",
                        "unitLong": "pinches"
                    },
                    "metric": {
                        "amount": 2.0,
                        "unitShort": "pinches",
                        "unitLong": "pinches"
                    }
                }
            },
            {
                "id": 1102047,
                "aisle": "Spices and Seasonings",
                "image": "salt-and-pepper.jpg",
                "consistency": "solid",
                "name": "salt and pepper",
                "nameClean": "salt and pepper",
                "original": "salt and freshly-ground pepper, to taste",
                "originalString": "salt and freshly-ground pepper, to taste",
                "originalName": "salt and freshly-ground pepper, to taste",
                "amount": 4.0,
                "unit": "servings",
                "meta": [
                    "freshly-ground",
                    "to taste"
                ],
                "metaInformation": [
                    "freshly-ground",
                    "to taste"
                ],
                "measures": {
                    "us": {
                        "amount": 4.0,
                        "unitShort": "servings",
                        "unitLong": "servings"
                    },
                    "metric": {
                        "amount": 4.0,
                        "unitShort": "servings",
                        "unitLong": "servings"
                    }
                }
            }
        ],
        "id": 716413,
        "title": "Oven Roasted Tomato Sauce",
        "readyInMinutes": 45,
        "servings": 4,
        "sourceUrl": "http://fullbellysisters.blogspot.com/2011/10/oven-roasted-tomato-sauce.html",
        "image": "https://spoonacular.com/recipeImages/716413-556x370.jpg",
        "imageType": "jpg",
        "summary": "Oven Roasted Tomato Sauce might be just the sauce you are searching for. Watching your figure? This caveman, gluten free, primal, and whole 30 recipe has <b>155 calories</b>, <b>1g of protein</b>, and <b>14g of fat</b> per serving. For <b>$1.41 per serving</b>, this recipe <b>covers 5%</b> of your daily requirements of vitamins and minerals. Plenty of people made this recipe, and 224 would say it hit the spot. Head to the store and pick up balsamic vinegar, garlic, grape tomatoes, and a few other things to make it today. From preparation to the plate, this recipe takes approximately <b>45 minutes</b>. All things considered, we decided this recipe <b>deserves a spoonacular score of 61%</b>. This score is pretty good. Try <a href=\"https://spoonacular.com/recipes/oven-roasted-tomato-sauce-617765\">Oven Roasted Tomato Sauce</a>, <a href=\"https://spoonacular.com/recipes/chicken-parmesan-with-oven-roasted-tomato-sauce-360158\">Chicken Parmesan with Oven-Roasted Tomato Sauce</a>, and <a href=\"https://spoonacular.com/recipes/oven-roasted-tomato-basil-marinara-sauce-800148\">Oven-Roasted Tomato Basil Marinara Sauce</a> for similar recipes.",
        "cuisines": [],
        "dishTypes": [
            "sauce"
        ],
        "diets": [
            "gluten free",
            "dairy free",
            "paleolithic",
            "lacto ovo vegetarian",
            "primal",
            "vegan"
        ],
        "occasions": [],
        "instructions": "",
        "analyzedInstructions": [],
        "originalId": null,
        "spoonacularSourceUrl": "https://spoonacular.com/oven-roasted-tomato-sauce-716413"
    }
]

}

I tried this code:

d.apply(lambda x: x.explode() if x.name in ['cuisines' 'dishTypes', 'diets', 'occasions','analyzedInstructions'] else x) 

The data comes from Spoonacular API:

import json
import pandas as pd
import requests # get connection
import json
from pandas.io.json import json_normalize  #package for flattening json in pandas df

YOUR_API_KEY='XXXXXXX'
base_url = 'https://api.spoonacular.com/recipes/random?limitLicense=false&tags=vegan&number=1&apiKey='+YOUR_API_KEY
json_text = requests.get(base_url).json()

json_text = json.loads(raw)
json_text = json_normalize(json_text['recipes'])
json_text.head(3)

I'm expecting a JSON that I can convert to CSV:

and get the following as headers:

    "vegetarian"
    "vegan"
    "glutenFree"
    "dairyFree"
    "veryHealthy"
    "cheap"
    "veryPopular"
    "sustainable"
    "weightWatcherSmartPoints"
    "gaps"
    "lowFodmap"
    "aggregateLikes"
    "spoonacularScore"
    "healthScore"
    "pricePerServing"

"extendedIngredients"  
   "id"
   "image"
   "name"
   "original"
   "amount"
    "unit"
        
"measures"          
   "metric"
   "amount"
   "unitShort"
   "unitLong"
         
 "id"
    "title": 
    "readyInMinutes"
    "servings"
    "sourceUrl"
    "image"
    "summary"
    "cuisines"
    "dishTypes"
    "diets"
    "occasions"
    "instructions"

"analyzedInstructions":         
   "name"
   "steps"         
   "number"
   "step"

Thanks for any help I could get

  • 7
    Will be good to explain what do you mean under *"flatten"* and add output sample. – Olvin Roght Apr 19 '21 at 21:34
  • Yeah sorry, I updated my question with more details @OlvinRoght – Amy raygada Apr 19 '21 at 22:20
  • 1
    Your expected output is still not clear. In your linked CSV you only includes data that is not nested. If that's what you're looking for then you don't need to _"flatten"_ at all. – Henry Ecker Apr 19 '21 at 23:05
  • 1
    Assuming that you do want to get some of those nested values how do you expect to receive that information? For Example, there are 7 items in "extendedIngredients" each of which has 19 elements. Do you want 7 rows with duplicated values for the other columns or do you expect to add 133 new columns to flatten to a single row? Please include a complete sample output for 1 of the recipes, or specify which keys you are looking to keep. – Henry Ecker Apr 19 '21 at 23:07
  • Hi @HenryEcker I updated the info already, the JSON contains only one recipe so is HUGE – Amy raygada Apr 20 '21 at 08:00

1 Answers1

0

It looks like you just want the top-level keys with scalar values, so, using jq, you could write:

jq -r '
  ((.recipes[0]
    | with_entries(select(.value|scalars)))
    | keys_unsorted) as $keys
  | $keys,
    (.recipes[] | [ .[$keys[]] ])
  | @csv
'

This produces a row of headers, and assumes that the objects under .recipes are sufficiently conformal.

If you want a general solution that will "flatten" an arbitrarily-nested JSON, you might like to consider the generic JSON-to-CSV converter at jq: Object cannot be csv-formatted, only array

peak
  • 105,803
  • 17
  • 152
  • 177
  • Hi there, Thanks so much for your code is super useful :) I fixed some errors that I got while running the code, however, I'm now getting 2 lists as output once with the keys and one with the answers, can you enlight me on how convert this to CSV correctly please? jq -r ' ((.recipes[0] | with_entries(select(.value|scalars))) | keys_unsorted) as $keys | $keys, (.recipes[] | [ .[$keys[]] ]) ' recipes.json > response2.csv – Amy raygada Apr 20 '21 at 11:25
  • Notice the last line of the jq program has @csv – peak Apr 20 '21 at 16:11