3

Suppose I have several dataframes: df1, df2, df3, etc. The label with each dataframes is A1, A2, A3 etc. I want to use this information as a whole, so that I can pass them. Three methods came into my mind:

method 1

use a label list: labels=["A1", "A2", "A3"...] and a list of dataframes dfs=[df1, df2, df3...].

method 2

use a dictionary: d={"A1": df1, "A2": df2, "A3": df3}.

method 3

use a pandas series: s=pd.Series([df1, df2, df3], index=["A1", "A2", "A3"]).

I will use the label and dataframes sequentially, therefore I think method1 and method3 should be my choice. However, using method 1 will need me to pass two items, while using method 3 I only need to keep one object. Is it a common practice to put the dataframes in a series? I seldom see people do this, is it against best practice? Is there any better suggestions?

an offer can't refuse
  • 4,245
  • 5
  • 30
  • 50
  • 1
    I like second one, also is possible use list of tuples `L=[("A1", df1),( "A2", df2), ("A3", df3)]` ? – jezrael Oct 01 '18 at 13:00
  • "I will the label use sequentially," - can you explain what you mean exactly? – Stuart Oct 01 '18 at 13:02
  • As long as the order of processing doesn't matter, I would use method 2 - it just explains the relationship label -> dataframe much better. – Daren Thomas Oct 01 '18 at 13:03
  • yeah @DarenThomas 1:1 relationship == mapping, which is dictionary in python – deadvoid Oct 01 '18 at 13:06
  • pandas series is a numpy array with labels, and as such designed mainly for numerical operations. I guess that's why people don't often use it to store objects like dataframes, but doesn't necessarily mean it shouldn't be used for this. – Stuart Oct 01 '18 at 13:09
  • @DarenThomas Actually I will use the label and the corresponding dataframes sequentially. – an offer can't refuse Oct 01 '18 at 13:12
  • Method 4: if all dataframes are similar, put them together. df = pd.concat(d) – piRSquared Oct 01 '18 at 13:48

2 Answers2

5

An OrderedDict would probably be more conventional than using a series for this.

from collections import OrderedDict
d = OrderedDict([("A1", df1), ("A2", df2), ("A3", df3)])

This can easily be iterated over:

for label, df in d:
    print(label, df)

That said I can't see any strong reason not to use a pandas series. A small advantage of using the series is that you can access the dataframes using dot notation s.A1, s.A2 etc. as well as using the dictionary-like notation s["A1"]. Using a series, it would also be relatively easy to sort the dataframes, insert additional ones in the middle, or associate additional metadata with them later if needed.

(See this question on dictionary ordering in Python 3.6 and 3.7 - you may be able to use an ordinary dictionary instead of an OrderedDict if using Python 3.7 and you don't need to use other 'ordered' behaviours. In Python 3.6, the preservation of insertion order is an implementation detail and should not be relied upon.)

Stuart
  • 9,597
  • 1
  • 21
  • 30
2

Method 2 also works. Since Python 3.6 it remembers the order it is created too.

korakot
  • 37,818
  • 16
  • 123
  • 144