1

I have a Python dict that is organized like this:

{
  "city": "New York",
  "zip": "11365",
  "ages": {
    "Joe": 25,
    "Jane": 30,
    "Taylor": 45
  },
  "dinner": {
    "aperitif": "tsikoudia",
    "appetizer": "edamame",
    "main course": "monkey brains"
  }
}

My goal is to get a csv file that looks like this:

Name,age,city
Joe,24,New York
Jane,30,New York
Taylor,45,New York

(As you may note, I'm only wanting to use part of the dictionary, and use a data item that is disparate from the others within the dictionary)

What is the most efficient way to get there?

Yaron Shragai
  • 137
  • 11

3 Answers3

0

You might want to filter your dictionary and build a dataframe from it. Below script will result a dataframe with the index of person's name. For converting index to column, you might want to check [1].

data_dict = {
  "city": "New York",
  "zip": "11365",
  "ages": {
    "Joe": 25,
    "Jane": 30,
    "Taylor": 45
  },
  "dinner": {
    "aperitif": "tsikoudia",
    "appetizer": "edamame",
    "main course": "monkey brains"
  }
}

keys = ['ages', 'city']
cache_dict = { key: data_dict[key] for key in keys }
df = pd.DataFrame.from_dict(cache_dict)
df.to_csv('out.csv', index=True)
Jiho Choi
  • 1,083
  • 1
  • 10
  • 26
0

If you change the "ages" entry in your dictionary to have a record layout (e.g. it is a list of dictionaries instead of a dictionary within a dictionary) you can use pd.json_normalize to easily achieve the desired output. However this method may not be the most "efficient" as I'm not entirely sure of the efficiency of the pd.json_normalize method.

transform json_data to contain record-style data

json_data["ages"] = [{"name": k, "age": v} for k, v in json_data["ages"].items()]

print(data)
{'city': 'New York',
 'zip': '11365',
 'ages': [{'age': 25, 'name': 'Joe'},
          {'age': 30, 'name': 'Jane'},
          {'age': 45, 'name': 'Taylor'}],
 'dinner': {'aperitif': 'tsikoudia',
            'appetizer': 'edamame',
            'main course': 'monkey brains'}}

Parse the json data indicating that our records are within the "ages" key, and that we have a meta key (a key that spans multiple records) with the "city" key.

df = pd.json_normalize(json_data, record_path="ages", meta="city")

print(df)
     name  age      city
0     Joe   25  New York
1    Jane   30  New York
2  Taylor   45  New York

Alternative solution In terms of efficiency, you may get slightly better performance from manually iterating over your entries and aggregating your rows into a 2d list and tranform that into a dataframe:

data = []
for name, age in json_data["ages"].items():
    row = [name, age, json_data["city"]]
    data.append(row)

df = pd.DataFrame(data, columns=["name", "age", "city"])

print(df)
     name  age      city
0     Joe   25  New York
1    Jane   30  New York
2  Taylor   45  New York
Cameron Riddell
  • 10,942
  • 9
  • 19
0
import pandas as pd
d = {
  "city": "New York",
  "zip": "11365",
  "ages": {
    "Joe": 25,
    "Jane": 30,
    "Taylor": 45
  },
  "dinner": {
    "aperitif": "tsikoudia",
    "appetizer": "edamame",
    "main course": "monkey brains"
  }
}

df = pd.DataFrame()
df['name'] = d['ages'].keys()
df['age'] = d['ages'].values()
df['city'] = d['city']
df.to_csv(r'name.csv', index=False)