1

Hi everyone im currently working on my school project and i need to convert my dic to dataframe in order to use it for machine learning.

myDic = {
    'Acura': {
      'CL': {
        '2003': {
          'transmission': '4',
          'engine': '1',
          'drivetrain': 'NHTSA: 13',
          'wheels_hubs': 'NHTSA: 8',
          'seat_belts_air_bags': 'NHTSA: 6',
          'brakes': 'NHTSA: 6',
          'lights': 'NHTSA: 5',
          'body_paint': 'NHTSA: 2',
          'fuel_system': 'NHTSA: 2',
          'electrical': 'NHTSA: 2',
          'suspension': 'NHTSA: 2',
          'miscellaneous': 'NHTSA: 1',
          'steering': 'NHTSA: 1'
        },
        '2002': {
          'transmission': '2',
          'engine': 'NHTSA: 8',
          'brakes': 'NHTSA: 7',
          'electrical': 'NHTSA: 4',
          'accessories-interior': 'NHTSA: 3',
          'seat_belts_air_bags': 'NHTSA: 3',
          'suspension': 'NHTSA: 2',
          'drivetrain': 'NHTSA: 2',
          'body_paint': 'NHTSA: 1',
          'accessories-exterior': 'NHTSA: 1',
          'windows_windshield': 'NHTSA: 1',
          'fuel_system': 'NHTSA: 1',
          'steering': 'NHTSA: 1',
          'miscellaneous': 'NHTSA: 1'
        }
      }
    }
}

it goes on like that. I can search my dic as myDic['Acura']['CL']['2003'] i mean 'brand'-'model'-'year' and it gives the problems about the car. So how can i convert this into dataframe ? Columns will be brand,model,year and the problems ?

Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156
Fatih Can
  • 53
  • 8
  • Does this answer your question? [Construct pandas DataFrame from items in nested dictionary](https://stackoverflow.com/questions/13575090/construct-pandas-dataframe-from-items-in-nested-dictionary) – PiCTo Apr 30 '20 at 14:53

1 Answers1

0

I assume what you are looking for is:

import pandas as pd 


restructure_dict = {
    (level1_key, level2_key, level3_key): values
    for level1_key, level2_dict in myDic.items()
    for level2_key, level3_dict in level2_dict.items()
    for level3_key, values in level3_dict.items()
}
df = pd.DataFrame(restructure_dict).T.reset_index()
df = df.rename(columns={'level_0': 'brand', 'level_1': 'model', 'level_2': 'year'})
print(df)

and the output will be:

   brand model  year transmission    engine drivetrain wheels_hubs seat_belts_air_bags    brakes    lights body_paint fuel_system electrical suspension miscellaneous  steering accessories-interior accessories-exterior windows_windshield
0  Acura    CL  2003            4         1  NHTSA: 13    NHTSA: 8            NHTSA: 6  NHTSA: 6  NHTSA: 5   NHTSA: 2    NHTSA: 2   NHTSA: 2   NHTSA: 2      NHTSA: 1  NHTSA: 1                  NaN                  NaN                NaN
1  Acura    CL  2002            2  NHTSA: 8   NHTSA: 2         NaN            NHTSA: 3  NHTSA: 7       NaN   NHTSA: 1    NHTSA: 1   NHTSA: 4   NHTSA: 2      NHTSA: 1  NHTSA: 1             NHTSA: 3             NHTSA: 1           NHTSA: 1

Another possible solution could be this:

import pandas as pd


restructure_dict = {
    (level1_key, level2_key, level3_key): values
    for level1_key, level2_dict in myDic.items()
    for level2_key, level3_dict in level2_dict.items()
    for level3_key, values in level3_dict.items()
}
df = pd.DataFrame(restructure_dict)
print(df)

And the output would be:

                          Acura          
                             CL          
                           2003      2002
transmission                  4         2
engine                        1  NHTSA: 8
drivetrain            NHTSA: 13  NHTSA: 2
wheels_hubs            NHTSA: 8       NaN
seat_belts_air_bags    NHTSA: 6  NHTSA: 3
brakes                 NHTSA: 6  NHTSA: 7
lights                 NHTSA: 5       NaN
body_paint             NHTSA: 2  NHTSA: 1
fuel_system            NHTSA: 2  NHTSA: 1
electrical             NHTSA: 2  NHTSA: 4
suspension             NHTSA: 2  NHTSA: 2
miscellaneous          NHTSA: 1  NHTSA: 1
steering               NHTSA: 1  NHTSA: 1
accessories-interior        NaN  NHTSA: 3
accessories-exterior        NaN  NHTSA: 1
windows_windshield          NaN  NHTSA: 1

Another option, would be the transposed version of the above result:

import pandas as pd 


restructure_dict = {
    (level1_key, level2_key, level3_key): values
    for level1_key, level2_dict in myDic.items()
    for level2_key, level3_dict in level2_dict.items()
    for level3_key, values in level3_dict.items()
}

df = pd.DataFrame(restructure_dict).T
print(df)

with an output of:

              transmission    engine drivetrain wheels_hubs seat_belts_air_bags    brakes    lights body_paint fuel_system electrical suspension miscellaneous  steering accessories-interior accessories-exterior windows_windshield
Acura CL 2003            4         1  NHTSA: 13    NHTSA: 8            NHTSA: 6  NHTSA: 6  NHTSA: 5   NHTSA: 2    NHTSA: 2   NHTSA: 2   NHTSA: 2      NHTSA: 1  NHTSA: 1                  NaN                  NaN                NaN
         2002            2  NHTSA: 8   NHTSA: 2         NaN            NHTSA: 3  NHTSA: 7       NaN   NHTSA: 1    NHTSA: 1   NHTSA: 4   NHTSA: 2      NHTSA: 1  NHTSA: 1             NHTSA: 3             NHTSA: 1           NHTSA: 1
Giorgos Myrianthous
  • 36,235
  • 20
  • 134
  • 156
  • Thank you so much for your solution and answer. Is there way to make last output like. Acura CL 2003 ....... Acura CL 2002 ........ it can be easier for me to use machine learning algorithms for that kind of dataframe since im just a beginner or can i use machine learning to that kind of dataframe ? Because i have lots of data to use about 70 brands may be 300 models and tons of years and problems – Fatih Can Apr 30 '20 at 15:09
  • @FatihCan I updated my answer. I think I know for what you are looking for. Hope it helps now. – Giorgos Myrianthous Apr 30 '20 at 15:36
  • I really appreciate it. Thats excatly what i was looking for. Thank you so much – Fatih Can Apr 30 '20 at 16:18
  • Hey again i hope can you see my reply. I've tried your updated answer and it had worked for my trail dataset. But now i've tried it on my full dataset. it fails columns contains columns that suppose to be row (other car brands and models there are like 10 columns like Hyundai/C2000/ .... etc) – Fatih Can May 06 '20 at 13:20