How to convert cells in a row from a dataframe to a dictionary using a loop on Python? Pandas related

Question

Let's say I have the following df:

       0      0      1               1    2       2     3        3      4    4    5     5         
0  Fondo Oceano Cuerpo Cuerpo cangrejo Ojos Antenas Color Amarillo Pinzas None Puas None            
1  Fondo Oceano Cuerpo Cuerpo cangrejo Ojos Antenas Color Amarillo Pinzas None Puas Arena     
2  Fondo Oceano Cuerpo Cuerpo cangrejo Ojos Antenas Color Amarillo Pinzas None Puas Marron    
3  Fondo Oceano Cuerpo Cuerpo cangrejo Ojos Antenas Color Amarillo Pinzas None Puas Purpura    
4  Fondo Oceano Cuerpo Cuerpo cangrejo Ojos Antenas Color Amarillo Pinzas None Puas Verde

I know I can use Series.iteritems this way to iterate over a particular row in this df and print the content of each cell in a particular row (ignoring the index column):

row = 0 #desired row
for _, e in df.iloc[row].iteritems():
    print(e)

Output:

Fondo

Oceano

Cuerpo

Cuerpo cangrejo

Ojos

Antenas

Color

Amarillo

Pinzas

None

Puas

None

But what I need like to learn now is how could I improve the loop above so that it creates a dictionary that has even cells as keys and odd cells as values respectively?

In other words, how could I get the following dictionary for the 0 row as output?

the_dic = { 'Fondo':'Oceano',
            'Cuerpo': 'Cuerpo cangrejo',
            'Ojos': 'Antenas',
            'Color': 'Amarillo',
            'Pinzas': 'None',
            'Puas': 'None'
          }

PS: The 'None' element in this case is a str value and not the object None

Looking for this ?: https://stackoverflow.com/questions/26716616/convert-a-pandas-dataframe-to-a-dictionary — nYuker_98 D, Jan 28 '22 at 07:41

jezrael · Accepted Answer · 2022-01-28T07:49:57.147

1

EDIT: Solution working if 2 duplicated values in columns names like in sample data:

print (df.columns)
Int64Index([0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5], dtype='int64')

You can loop by indices with convert first and second values in dict comprehension:

row = 0
d = {x.iat[0]: x.iat[1] for name, x in df.iloc[row].groupby(level=0)}
print (d)
{'Fondo': 'Oceano', 'Cuerpo': 'Cuerpo cangrejo', 'Ojos': 'Antenas', 'Color': 'Amarillo', 'Pinzas': 'None', 'Puas': 'None'}

Or filter first and last indices and add zip with dict:

row = 0
s = df.iloc[row]

d = dict(zip(s[~s.index.duplicated()], s[~s.index.duplicated(keep='last')]))
print (d)
{'Fondo': 'Oceano', 'Cuerpo': 'Cuerpo cangrejo', 'Ojos': 'Antenas', 'Color': 'Amarillo', 'Pinzas': 'None', 'Puas': 'None'}

For testing:

s = pd.Series(['Fondo', 'Oceano', 'Cuerpo', 'Cuerpo cangrejo', 'Ojos', 
               'Antenas', 'Color', 'Amarillo', 'Pinzas', 'None', 'Puas', 'None'],
              index=[0,0,1,1,2,2,3,3,4,4,5,5])
print (s)
0              Fondo
0             Oceano
1             Cuerpo
1    Cuerpo cangrejo
2               Ojos
2            Antenas
3              Color
3           Amarillo
4             Pinzas
4               None
5               Puas
5               None
dtype: object

d = dict(zip(s[~s.index.duplicated(keep='last')], s[~s.index.duplicated()]))
print (d)
{'Oceano': 'Fondo', 'Cuerpo cangrejo': 'Cuerpo', 'Antenas': 'Ojos', 'Amarillo': 'Color', 'None': 'Puas'}

edited Jan 28 '22 at 07:49

answered Jan 28 '22 at 07:30

jezrael

822,522
95
1,334
1,252

Hmph, I tried the first one and I got the following error: `IndexError: index 1 is out of bounds for axis 0 with size 1`, and second one creates a dictionary with the same `keys` as `values` for every cell in the row :( – NoahVerner Jan 28 '22 at 07:35
@NoahVerner - are there columns names duplicated? What is `print (df.columns)` ? – jezrael Jan 28 '22 at 07:36
Yes, the name of the columns are duplicated :( @jezreal – NoahVerner Jan 28 '22 at 07:37
I get the warning `Undefined name Int64Index` after typing `Int64Index([0, 0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5], dtype='int64')`, did I forget to import any module? – NoahVerner Jan 28 '22 at 07:45
1

@NoahVerner - You can use `df.columns=[0,0,1,1,2,2,3,3,4,4,5,5]` – jezrael Jan 28 '22 at 07:50
1

Or `s.index=[0,0,1,1,2,2,3,3,4,4,5,5]` – jezrael Jan 28 '22 at 07:50
1

Thank you sir, after adding `df_metadata.columns=[0,0,1,1,2,2,3,3,4,4,5,5]` right above the second solution you provided the program printed what I needed :), I appreciate it. – NoahVerner Jan 28 '22 at 07:54

How to convert cells in a row from a dataframe to a dictionary using a loop on Python? Pandas related

1 Answers1