Is there a better way to iterate over a Pandas Dataframe?

Question

I'm checking a subscription products database. And need to iterate over a column and compare it to all the columns in the same row. And at the end, assign it true or false if it finds the same value on a specified column.

df_full_part1

df_full_part2

df_full_part3

df_full_part4

The loop I'm trying

for index, row in df_full.iterrows():
    for col in all_types_df.columns:
        for col2 in df_full.columns[8:]:
            if 'nao' in col2:
                if df_full.loc[index]['got_unexpected'] == True or df_full.loc[index][col2] == 'nenhum':
                    continue
                else:
                    df_full.at[index, 'got_unexpected'] = (all_types_df.loc[index][col] == df_full.loc[index][col2])
            elif 'sempre' in col2:
                if df_full.loc[index]['got_expected'] == True or df_full.loc[index][col2] == 'nenhum':
                    continue
                else:
                    df_full.at[index, 'got_expected'] = (all_types_df.loc[index][col] == df_full.loc[index][col2])

{'all_types_0': ['protetor',
  'spray',
  'pinceis',
  'pinceis',
  'hidratante',
  'hidratante'],
 'all_types_1': [' esfoliante',
  ' protetor',
  ' po',
  ' geral',
  ' serum',
  ' shampoo'],
 'all_types_2': [' spray',
  ' toalha',
  ' geral',
  ' geral',
  ' tonico',
  ' tonico'],
 'all_types_3': [' toalha',
  ' esfoliante',
  'nenhum',
  ' po',
  ' shampoo',
  ' serum'],
 'box_order_id': [60501, 68771, 70333, 79623, 80858, 89258],
 'email': ['isabelasantana@globo.com',
  'stefany_mend@hotmail.com',
  'isabelasantana@globo.com',
  'stefany_mend@hotmail.com',
  'isabelasantana@globo.com',
  'stefany_mend@hotmail.com'],
 'got_expected': [True, True, True, False, True, True],
 'got_unexpected': [False, 'False', False, 'False', False, 'False'],
 'make_nao_quero_receber_0': ['lapis-boca',
  'lapis-boca',
  'lapis-boca',
  'lapis-boca',
  'lapis-boca',
  'lapis-boca'],
 'make_sempre_0': ['batom', 'corretivo', 'batom', 'pinceis', 'batom', 'batom'],
 'make_sempre_1': ['iluminador',
  'batom',
  'iluminador',
  'produtos-sombrancelha',
  'iluminador',
  'blush'],
 'make_sempre_2': ['pinceis',
  'produtos-sombrancelha',
  'pinceis',
  'batom',
  'pinceis',
  'corretivo'],
 'make_sempre_3': ['primer',
  'pinceis',
  'primer',
  'blush',
  'primer',
  'delineador'],
 'make_sempre_4': ['nenhum',
  'mascara-cilios',
  'nenhum',
  'corretivo',
  'nenhum',
  'gloss'],
 'make_sempre_5': ['nenhum',
  'iluminador',
  'nenhum',
  'delineador',
  'nenhum',
  'iluminador'],
 'make_sempre_6': ['nenhum',
  'gloss',
  'nenhum',
  'gloss',
  'nenhum',
  'mascara-cilios'],
 'make_sempre_7': ['nenhum',
  'blush',
  'nenhum',
  'iluminador',
  'nenhum',
  'pinceis'],
 'make_sempre_8': ['nenhum',
  'delineador',
  'nenhum',
  'mascara-cilios',
  'nenhum',
  'produtos-sombrancelha'],
 'skin_nao_quero_receber_0': ['balm-tatuagem',
  'nenhum',
  'balm-tatuagem',
  'nenhum',
  'balm-tatuagem',
  'nenhum'],
 'skin_sempre_0': ['esfoliante',
  'limpador-facial',
  'esfoliante',
  'esfoliante',
  'esfoliante',
  'esfoliante'],
 'skin_sempre_1': ['hidratante-labial',
  'esfoliante',
  'hidratante-labial',
  'hidratante-labial',
  'hidratante-labial',
  'hidratante-labial'],
 'skin_sempre_2': ['hidratante',
  'hidratante-labial',
  'hidratante',
  'hidratante',
  'hidratante',
  'hidratante'],
 'skin_sempre_3': ['limpador-facial',
  'hidratante',
  'limpador-facial',
  'limpador-facial',
  'limpador-facial',
  'limpador-facial'],
 'skin_sempre_4': ['serum', 'serum', 'serum', 'serum', 'serum', 'serum'],
 'skin_sempre_5': ['tonico', 'nenhum', 'tonico', 'nenhum', 'tonico', 'nenhum'],
 'skin_sempre_6': ['creme-de-maos',
  'nenhum',
  'creme-de-maos',
  'nenhum',
  'creme-de-maos',
  'nenhum']}

I've made this loop because I don't know many columns will I get and each one of them have their own names. eg. Column_1, Column_2, Column_3 and so on

Welcome Thomas! did you read this [question](https://stackoverflow.com/q/16476924/66928) ? — RichieV, Aug 17 '20 at 20:26
I did.. actually that was exactly where I took the idea from.. But I've been reading about the iteration on dataframes, and about how that's not a good practice and probably too slow. And that should be one of my last option. Maybe I didn't make myself clear enough. But I would like to know another way to iterate over the df, instead of actually iterating, you know? I read something about Vectorization. But I want to know if this would apply in my case, and how to do that in an effective way. Thanks for your help! — Thomas Gallasso, Aug 18 '20 at 12:10

score 0 · Answer 1 · answered Aug 18 '20 at 19:51

Here is a solution using only vectorized functions built into pandas.

First of all, please be kind and always provide sample data that can be easily copied into and IDE, either using df.head(10).to_dict('list'), df.head(10) or a callable line as in

df_full = pd.DataFrame([
    ['food', 'drink', 'toy', 'toy', 'food'],
    ['food', 'socks', 'ball', 'toy', 'food'],
    ['bowl', 'supply', 'toy', 'food', 'toy'],
    ['food', 'drink', 'socks', 'socks', 'bowl']],
    columns=['received', 'received1', 'received2', 'dont_want', 'want'])

and of course make sure your sample is representative and can reproduce any behavior/errors you are asking about.

Now, unrelated to you question, but keep in mind you can use df.filter to select columns with a specific pattern.

all_types_df = df_full.filter(regex=r'received')

Finally for you actual question.

df_full['exp'] = ( # find out if they got something they wanted, and save in a new column
    (all_types_df ==
        df_full.want # get want column as 1D array
        .to_frame().values) # cast as a 2D array, so pandas can broadcast and compare to the 2D df
    .any(axis=1) # the lines above return a boolean df with same shape as df_full, aggregate with any row-wise
) # returns a boolean series
df_full['une'] = ( # find out if they got something they did not want
    (all_types_df == df_full.dont_want.to_frame().values).any(axis=1)
) # returns a boolean series

# to account for 'anything' in either column want/dont_want
# once a column is created you can access it as an attribute
df_full.exp = df_full.exp | (df_full.want=='anything') 
df_full.une = df_full.une | (df_full.dont_want=='anything')

print(df_full)

Output

  received received1 received2 dont_want  want    exp    une
0     food     drink       toy       toy  food   True   True
1     food     socks      ball       toy  food   True  False
2     bowl    supply       toy      food   toy   True  False
3     food     drink     socks     socks  bowl  False   True

Notice your table has False on (1, exp) when that row has 'food' in the first column.

Thanks man... I'm sorry for that! I didn't know about the head to dictionary, thanks for that. But the thing is, that I tried to simplify my df, to make it easier to understand but, in fact I've made it harder. Because my biggest problem is that I dont know how many columns I have. So I cant compare the things I received with the thing I want, because I don't know how many I've received nor how many thing do I want. Is it clearer? I'll update the code on my post so you can see the actual table. — Thomas Gallasso, Aug 19 '20 at 11:49

Is there a better way to iterate over a Pandas Dataframe?

1 Answers1