How to create an array of dataframes in Python

Question

I want to write a piece of code to create multiple arrays of dataFrames with their names in the format of word_0000, where the four digits are month and year. An example of what I'd like to do is to create the following dataframes:

df_0115, df_0215, df_0315, ... , df_1215
stat_0115, stat_0215, stat_0315, ... , stat_1215

better use dictionary `df['0115'], df['0215'], stat['0115'], stat['0215']`, etc, — furas, Nov 25 '15 at 03:14

Pedro M Duarte · Accepted Answer · 2015-11-25T15:56:02.730

34

I suggest that you create a dictionary to hold the DataFrames. That way you will be able to index them with a month-day key:

import datetime as dt 
import numpy as np
import pandas as pd

dates_list = [dt.datetime(2015,11,i+1) for i in range(3)]
month_day_list = [d.strftime("%m%d") for d in dates_list]

dataframe_collection = {} 

for month_day in month_day_list:
    new_data = np.random.rand(3,3)
    dataframe_collection[month_day] = pd.DataFrame(new_data, columns=["one", "two", "three"])

for key in dataframe_collection.keys():
    print("\n" +"="*40)
    print(key)
    print("-"*40)
    print(dataframe_collection[key])

The code above prints out the following result:

========================================
1102
----------------------------------------
        one       two     three
0  0.896120  0.742575  0.394026
1  0.414110  0.511570  0.268268
2  0.132031  0.142552  0.074510

========================================
1103
----------------------------------------
        one       two     three
0  0.558303  0.259172  0.373240
1  0.726139  0.283530  0.378284
2  0.776430  0.243089  0.283144

========================================
1101
----------------------------------------
        one       two     three
0  0.849145  0.198028  0.067342
1  0.620820  0.115759  0.809420
2  0.997878  0.884883  0.104158

edited Nov 25 '15 at 15:56

answered Nov 25 '15 at 03:18

Pedro M Duarte

26,823
7
44
43

Thank you Pedro! Is it necessesary to do the `new_dataframe =A` and `dataframe_collection[month_day] = new_dataframe` like this? I just did `dataframe_collection[month_day] = A`. – Ana Nov 25 '15 at 15:43
I am also curious why the print procedure prints the dataframes in a random order! In my case it does not matter, it's just a general question. – Ana Nov 25 '15 at 15:45
1

Hi Ana, what you did is correct. There is no need for the `new_dataframe` intermediate variable. I updated the answer to reflect that. As far as the random order in which the result is printed, this has to do with python's implementation of the dictionary. The dictionary key-value pairs are stored in a data structure called a hash table. This data structure is designed for very fast lookups and, as part of the algorithm to achieve this, the way the keys are stored in it can be random. – Pedro M Duarte Nov 25 '15 at 16:00
1

If your application requires you to iterate over the dictionary keys in a sorted fashion, I recommend that you import the `collections` module and use an `OrderedDict` rather than a plain `dict` to collect your dataframes: `dataframe_collection = collections.OrderedDict()` – Pedro M Duarte Nov 25 '15 at 16:03

score 8 · Answer 2 · edited Dec 15 '19 at 19:22

8

df will have all the CSV files you need. df[0] to access first one

df=[]    
files = glob.glob("*.csv")
    for a in files:
        df.append( pd.read_csv(a))

edited Dec 15 '19 at 19:22

ChrisMM

8,448
13
29
48

answered Dec 15 '19 at 18:17

Malik Mussabeheen Noor

776
6
11

How to create an array of dataframes in Python

2 Answers2

Linked