Dataframe iteration better practices for values assignment

Question

I was wondering how to make cleaner code, so I started to pay attention to some of my daily code routines. I frequently have to iterate over a dataframe to update a list of dicts:

foo = []
for index, row in df.iterrows():
        bar = {}
        bar['foobar0'] = row['foobar0']
        bar['foobar1'] = row['foobar1']
        foo.append(bar)

I think it is hard to maintain, because if df keys are changed, then the loop will not work. Besides that, write same index for two data structures is kind of code duplication.

The context is, I frequently make api calls to a specific endpoint that receives a list of dicts.

I'm looking for improviments for that routine, so how can I change index assignments to some map and lambda tricks, in order to avoid errors caused by key changes in a given dataframe(frequently resulted from some query in database)?

In other words, If a column name in database is changed, the dataframe keys will change too, So I'd like to create a dict on the fly with same keys of a given dataframe and fill each dict entry with dataframe corresponding values.

How can I do that?

Your question is a bit confusing. At the end, you say "I'd like to create a dict on the fly with same keys of a given dataframe and fill each dict entry with dataframe corresponding values", which sounds like you want a dict of lists. But the rest of the question sounds like you want a list of dicts. — abarnert, Jun 28 '18 at 14:00

abarnert · Accepted Answer · 2018-06-28T14:02:10.310

The simple way to do this is to_dict, which takes an orient argument that you can use to specify how you want the result structured.

In particular, orient='records' gives you a list of records, each one a dict in {col1name: col1value, col2name: col2value, ...} format.

(Your question is a bit confusing. At the very end, you say, "I'd like to create a dict on the fly with same keys of a given dataframe and fill each dict entry with dataframe corresponding values." This makes it sound like you want a dict of lists (that's to_dict(orient='list') or maybe a dict of dicts (that's to_dict(orient='dict')—or just to_dict(), because that's the default), not a list of dicts.

If you want to know how to do this manually (which you don't want to actually do, but it's worth understanding): a DataFrame acts like a dict, with the column names as the keys and the Series as the values. So you can get a list of the column names the same way you do with a normal dict:

columns = list(df)

Then:

foo = []
for index, row in df.iterrows():
    bar = {}
    for key in keys:
        bar[key] = row[key]
    foo.append(bar)

Or, more compactly:

foo = [{key: row[key] for key in keys} for _, row in df.iterrows()}]

Dataframe iteration better practices for values assignment

1 Answers1