Pandas: Dictionary of Dataframes

Question

I have a function that I made to analyze experimental data (all individual .txt files)

This function outputs a dictionary ({}) of Pandas Dataframes

Is there a efficient way to iterate over this dictionary and output individual dataframes? Let's say my dictionary is called analysisdict

for key in analysisdict.keys():
    dfx=pd.concat([analysisdict[key]['X'], analysisdict[key]['Y']], axis=1)

Where dfx would be an individual dataframe. (I'm guessing a second loop might be required? Perhaps I should iterate through a list of df names?)

The output would be df1...dfn

aside from an unecessary call to `analysisdict.keys` it looks like your code should work... what exactly is isn't working? — juanpa.arrivillaga, Dec 28 '17 at 20:14
I was hoping to create individual dataframes, one for each key in the dictionary — Adi, Dec 28 '17 at 20:32

Peter Leimbigler · Accepted Answer · 2017-12-28T20:32:33.770

2

EDIT: I initially misread your question, and thought you wanted to concatenate all the DataFrames into one. This does that:

dfx = pd.concat([df for df in analysisdict.values()], ignore_index=True)

(Thanks to @paul-h for the ignore_index=True tip)

I read your question more carefully and realized that you're asking how to assign each DataFrame in your dictionary to its own variable, resulting in separate DataFrames named df1, df2, ..., dfn. Everything in my experience says that dynamically creating variables in this way is an anti-pattern, and best left to dictionaries. Check out the discussion here: How can you dynamically create variables via a while loop?

edited Dec 28 '17 at 20:32

answered Dec 28 '17 at 20:14

Peter Leimbigler

10,775
1
23
37

Assuming this works you could drop the `[]`. But this has nothing to do with the requested answer. – Anton vBR Dec 28 '17 at 20:16
@AntonvBR what's the benefit of that? – Paul H Dec 28 '17 at 20:18
@PaulH Use a generator instead of list comprehension, should be faster. – Anton vBR Dec 28 '17 at 20:19
1

I would add `ignore_index=True` to the call to `pd.concat` so that you don't end up with many duplicates in the index. – Paul H Dec 28 '17 at 20:19
@AntonvBR you have to iterate through either way. i don't see a benefit to lazy evaluation here. – Paul H Dec 28 '17 at 20:19
@AntonvBR, you're right! Without the `[]`, the argument becomes a generator expression (right?). Discussion for anyone else who's new to this: https://stackoverflow.com/questions/10998521/square-braces-not-required-in-list-comprehensions-when-used-in-a-function – Peter Leimbigler Dec 28 '17 at 20:20
Sorry for the confusion! What would you recommend to do in order to iterate over the dataframes in the dictionary and save them as independent dataframes (without specifying the names for these new dataframes). So the loop would yield 3 distinct dataframes through a dictionary of 3 dataframes – Adi Dec 28 '17 at 20:33
@Adi, that depends on what you plan to do with them. The use cases I can imagine all involve either keeping the DataFrames in the dictionary, or concatenating them all into a consolidated DataFrame. – Peter Leimbigler Dec 28 '17 at 22:16
@PeterLeimbigler Specifically plot them. Each dataframe within the dictionary corresponds to a sensor read out (time and current series) – Adi Dec 28 '17 at 22:19
@Adi, thanks for the quick reply. For quick and dirty plotting, I would personally leave the DataFrames in the dict and loop through with `for i, df in analysisdict.items()`, using `i` to set plot values specific to each DF (color, linewidth, etc.). If plotting or further analysis gets complicated, it might be worth adding an extra identifier column to each DF (`for i, d in analysisdict.items(): d['df_id'] = i`), then concatenating all of the DFs together with the line in my answer. – Peter Leimbigler Dec 28 '17 at 22:26
1

@peterleimbigler I'll try this out right now! Thank you for your reply! – Adi Jan 02 '18 at 19:37
@Adi, you're welcome, and happy coding! – Peter Leimbigler Jan 02 '18 at 20:33

Pandas: Dictionary of Dataframes

1 Answers1