2

I've got dataframe like this :

Name    Nationality    Tall    Age
John    USA            190     24
Thomas  French         194     25
Anton   Malaysia       180     23
Chris   Argentina      190     26

so let say i got incoming data structure like this. each element representing the data of each row. :

data = [{
         'food':{'lunch':'Apple',
                'breakfast':'Milk',
                'dinner':'Meatball'},
         'drink':{'favourite':'coke',
                   'dislike':'juice'}
         },
         ..//and 3 other records
       ].

'data' is some variable that save predicted food and drink from my machine learning. There is more record(about 400k rows) but i process them by batch size (right now i process 2k data each iteration) through iteration. Expected result like:

Name    Nationality    Tall    Age Lunch Breakfast Dinner   Favourite Dislike
John    USA            190     24  Apple Milk      Meatball Coke      Juice
Thomas  French         194     25  ....
Anton   Malaysia       180     23  ....
Chris   Argentina      190     26  ....

Is there's an effective way to achive that dataframe? so far i've already tried to iterate the data variables and get the value of each predicted label. which its feels like that process took much time.

Fregy
  • 111
  • 1
  • 7

1 Answers1

0

You need flatenning dictionaries first, create DataFrame and join to original:

data = [{
         'a':{'lunch':'Apple',
                'breakfast':'Milk',
                'dinner':'Meatball'},
         'b':{'favourite':'coke',
              'dislike':'juice'}
         },
         {
         'a':{'lunch':'Apple1',
                'breakfast':'Milk1',
                'dinner':'Meatball2'},
         'b':{'favourite':'coke2',
              'dislike':'juice3'}
         },

{
         'a':{'lunch':'Apple4',
                'breakfast':'Milk5',
                'dinner':'Meatball4'},
         'b':{'favourite':'coke2',
              'dislike':'juice4'}
         },
         {
         'a':{'lunch':'Apple3',
                'breakfast':'Milk8',
                'dinner':'Meatball7'},
         'b':{'favourite':'coke4',
              'dislike':'juice1'}
         }
]

#or use another solutions, both are nice
L = [{k: v for x in d.values() for k, v in x.items()} for d in data]

df1 = pd.DataFrame(L)
print (df1)
  breakfast     dinner dislike favourite   lunch
0      Milk   Meatball   juice      coke   Apple
1     Milk1  Meatball2  juice3     coke2  Apple1
2     Milk5  Meatball4  juice4     coke2  Apple4
3     Milk8  Meatball7  juice1     coke4  Apple3

df2 = df.join(df1)
print (df2)
     Name Nationality  Tall  Age breakfast     dinner dislike favourite  \
0    John         USA   190   24      Milk   Meatball   juice      coke   
1  Thomas      French   194   25     Milk1  Meatball2  juice3     coke2   
2   Anton    Malaysia   180   23     Milk5  Meatball4  juice4     coke2   
3   Chris   Argentina   190   26     Milk8  Meatball7  juice1     coke4   

    lunch  
0   Apple  
1  Apple1  
2  Apple4  
3  Apple3  
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252