20

I want to write a piece of code to create multiple arrays of dataFrames with their names in the format of word_0000, where the four digits are month and year. An example of what I'd like to do is to create the following dataframes:

df_0115, df_0215, df_0315, ... , df_1215
stat_0115, stat_0215, stat_0315, ... , stat_1215
Ana
  • 1,516
  • 3
  • 15
  • 26

2 Answers2

34

I suggest that you create a dictionary to hold the DataFrames. That way you will be able to index them with a month-day key:

import datetime as dt 
import numpy as np
import pandas as pd

dates_list = [dt.datetime(2015,11,i+1) for i in range(3)]
month_day_list = [d.strftime("%m%d") for d in dates_list]

dataframe_collection = {} 

for month_day in month_day_list:
    new_data = np.random.rand(3,3)
    dataframe_collection[month_day] = pd.DataFrame(new_data, columns=["one", "two", "three"])

for key in dataframe_collection.keys():
    print("\n" +"="*40)
    print(key)
    print("-"*40)
    print(dataframe_collection[key])

The code above prints out the following result:

========================================
1102
----------------------------------------
        one       two     three
0  0.896120  0.742575  0.394026
1  0.414110  0.511570  0.268268
2  0.132031  0.142552  0.074510

========================================
1103
----------------------------------------
        one       two     three
0  0.558303  0.259172  0.373240
1  0.726139  0.283530  0.378284
2  0.776430  0.243089  0.283144

========================================
1101
----------------------------------------
        one       two     three
0  0.849145  0.198028  0.067342
1  0.620820  0.115759  0.809420
2  0.997878  0.884883  0.104158
Pedro M Duarte
  • 26,823
  • 7
  • 44
  • 43
  • Thank you Pedro! Is it necessesary to do the `new_dataframe =A` and `dataframe_collection[month_day] = new_dataframe` like this? I just did `dataframe_collection[month_day] = A`. – Ana Nov 25 '15 at 15:43
  • I am also curious why the print procedure prints the dataframes in a random order! In my case it does not matter, it's just a general question. – Ana Nov 25 '15 at 15:45
  • 1
    Hi Ana, what you did is correct. There is no need for the `new_dataframe` intermediate variable. I updated the answer to reflect that. As far as the random order in which the result is printed, this has to do with python's implementation of the dictionary. The dictionary key-value pairs are stored in a data structure called a hash table. This data structure is designed for very fast lookups and, as part of the algorithm to achieve this, the way the keys are stored in it can be random. – Pedro M Duarte Nov 25 '15 at 16:00
  • 1
    If your application requires you to iterate over the dictionary keys in a sorted fashion, I recommend that you import the `collections` module and use an `OrderedDict` rather than a plain `dict` to collect your dataframes: `dataframe_collection = collections.OrderedDict()` – Pedro M Duarte Nov 25 '15 at 16:03
8

df will have all the CSV files you need. df[0] to access first one

df=[]    
files = glob.glob("*.csv")
    for a in files:
        df.append( pd.read_csv(a))
ChrisMM
  • 8,448
  • 13
  • 29
  • 48