Should I use the dictionary or the series to hold a bunch of dataframe?

Question

Suppose I have several dataframes: df1, df2, df3, etc. The label with each dataframes is A1, A2, A3 etc. I want to use this information as a whole, so that I can pass them. Three methods came into my mind:

method 1

use a label list: labels=["A1", "A2", "A3"...] and a list of dataframes dfs=[df1, df2, df3...].

method 2

use a dictionary: d={"A1": df1, "A2": df2, "A3": df3}.

method 3

use a pandas series: s=pd.Series([df1, df2, df3], index=["A1", "A2", "A3"]).

I will use the label and dataframes sequentially, therefore I think method1 and method3 should be my choice. However, using method 1 will need me to pass two items, while using method 3 I only need to keep one object. Is it a common practice to put the dataframes in a series? I seldom see people do this, is it against best practice? Is there any better suggestions?

I like second one, also is possible use list of tuples `L=[("A1", df1),( "A2", df2), ("A3", df3)]` ? — jezrael, Oct 01 '18 at 13:00
"I will the label use sequentially," - can you explain what you mean exactly? — Stuart, Oct 01 '18 at 13:02
As long as the order of processing doesn't matter, I would use method 2 - it just explains the relationship label -> dataframe much better. — Daren Thomas, Oct 01 '18 at 13:03
yeah @DarenThomas 1:1 relationship == mapping, which is dictionary in python — deadvoid, Oct 01 '18 at 13:06
pandas series is a numpy array with labels, and as such designed mainly for numerical operations. I guess that's why people don't often use it to store objects like dataframes, but doesn't necessarily mean it shouldn't be used for this. — Stuart, Oct 01 '18 at 13:09
@DarenThomas Actually I will use the label and the corresponding dataframes sequentially. — an offer can't refuse, Oct 01 '18 at 13:12
Method 4: if all dataframes are similar, put them together. df = pd.concat(d) — piRSquared, Oct 01 '18 at 13:48

Stuart · Accepted Answer · 2018-10-01T13:38:54.033

An OrderedDict would probably be more conventional than using a series for this.

from collections import OrderedDict
d = OrderedDict([("A1", df1), ("A2", df2), ("A3", df3)])

This can easily be iterated over:

for label, df in d:
    print(label, df)

That said I can't see any strong reason not to use a pandas series. A small advantage of using the series is that you can access the dataframes using dot notation s.A1, s.A2 etc. as well as using the dictionary-like notation s["A1"]. Using a series, it would also be relatively easy to sort the dataframes, insert additional ones in the middle, or associate additional metadata with them later if needed.

(See this question on dictionary ordering in Python 3.6 and 3.7 - you may be able to use an ordinary dictionary instead of an OrderedDict if using Python 3.7 and you don't need to use other 'ordered' behaviours. In Python 3.6, the preservation of insertion order is an implementation detail and should not be relied upon.)

If nothing wrong with the series option, I strongly prefer using that. — an offer can't refuse, Oct 01 '18 at 13:55

score 2 · Answer 2 · answered Oct 01 '18 at 13:20

2

Method 2 also works. Since Python 3.6 it remembers the order it is created too.

answered Oct 01 '18 at 13:20

korakot

37,818
16
123
144

Can you expand on this? What's new on dict in 3.6? – an offer can't refuse Oct 01 '18 at 13:21
When you loop through the dict, it will return in the order of A1, A2, A3.(the same as when it is created). So, you can use for k in d, or for k,v in d.items() – korakot Oct 01 '18 at 13:24
could you please provide a link or source were the order is stated to stay the same? – Sasha Tsukanov Oct 01 '18 at 16:08
https://www.blog.pythonlibrary.org/2018/02/27/python-3-7-dictionaries-now-ordered/ – korakot Oct 01 '18 at 17:01

Should I use the dictionary or the series to hold a bunch of dataframe?

method 1

method 2

method 3

2 Answers2