1

I have a pandas dataframe named tshirt_orders from an API call looking like this:

Alice, small, red  
Alice, small, green  
Bob, small, blue  
Bob, small, orange  
Cesar, medium, yellow  
David, large, purple  

How can I get this into a dictionary style format where I first go by size and have sub keys under for name and another sublist for color so that I can address it when iterating over by using tshirt_orders?

Like this:

size:
        small:
            Name:
              Alice:
                 Color:
                    red
                    green
              Bob:
                 Color:
                    blue
                    orange
         medium:
             Name:
               Cesar:
                    Color:
                        yellow
          large:
               Name:
                  David:
                     Color:
                        purple

What would be the best solution to change this? It is in a pandas dataframe but changing that isn't a problem should there be better solutions.

xeet
  • 163
  • 2
  • 10

1 Answers1

2

The close is write DataFrame to yaml.

First create nested dictionaries in dict comprehension:

print (df)
       A       B       C
0  Alice   small     red
1  Alice   small   green
2    Bob   small    blue
3    Bob   small  orange
4  Cesar  medium  yellow
5  David   large  purple

d = {k:v.groupby('A', sort=False)['C'].apply(list).to_dict() 
      for k, v in df.groupby('B', sort=False)}
print (d)
{'small': {'Alice': ['red', 'green'], 
           'Bob': ['blue', 'orange']}, 
'medium': {'Cesar': ['yellow']}, 
'large': {'David': ['purple']}}

Add size to dict for key and then write to yaml file:

import yaml
with open('result.yml', 'w') as yaml_file:
    yaml.dump({'size': d}, yaml_file, default_flow_style=False, sort_keys=False)

size:
  small:
    Alice:
    - red
    - green
    Bob:
    - blue
    - orange
  medium:
    Cesar:
    - yellow
  large:
    David:
    - purple

Or create json:

import json

with open("result.json", "w") as twitter_data_file:
    json.dump({'size': d}, twitter_data_file, indent=4)

{
    "size": {
        "small": {
            "Alice": [
                "red",
                "green"
            ],
            "Bob": [
                "blue",
                "orange"
            ]
        },
        "medium": {
            "Cesar": [
                "yellow"
            ]
        },
        "large": {
            "David": [
                "purple"
            ]
        }
    }
}

EDIT:

df = df.assign(A1='Name', B1='size', C1='Color')

df1 = df.groupby(['B1','B','A1','A','C1'], sort=False)['C'].apply(list).reset_index()

#https://stackoverflow.com/a/19900276
def recur_dictify(frame):
    if len(frame.columns) == 1:
        if frame.values.size == 1: return frame.values[0][0]
        return frame.values.squeeze()
    grouped = frame.groupby(frame.columns[0], sort=False)
    d = {k: recur_dictify(g.iloc[:,1:]) for k,g in grouped}
    return d

d = recur_dictify(df1)
print (d)
{'size': {'small': {'Name': {'Alice': {'Color': ['red', 'green']}, 
                             'Bob': {'Color': ['blue', 'orange']}}}, 
         'medium': {'Name': {'Cesar': {'Color': ['yellow']}}}, 
         'large': {'Name': {'David': {'Color': ['purple']}}}}}

import yaml
with open('result.yml', 'w') as yaml_file:
    yaml.dump(d, yaml_file, default_flow_style=False, sort_keys=False)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252