2

I have 5 DataFrame with columns 'day', 'number', 'id', 'recordDay', and I put all the 5 dataframes in a dictionary. I would like to save 5 dataframes in 5 CSV files with file names based on 'id' and 'recordDay'. Here's the example of dataframe1 and dataframe2

df1                                       df2
     day     number   id   recordDay         day     number   id    recordDay
2017-03-21     17      1   1990-01-01     2016-03-21    6      2   1991-02-01
2017-03-22     19      1   1990-01-01     2016-03-22    8      2   1991-02-01
2017-03-23     21      1   1990-01-01     2016-03-23   10      2   1991-02-01

Is it possible to save 5 CSV files with file names like this, 'id_1_1991_01_01.csv', 'id_2_1991_02_01.csv', 'id_3_1991_03_01.csv','id_4_1991_04_01.csv', 'id_5_1991_05_01.csv' Or 'id_1.csv'...'id_5.csv'would be better?

I used the following code, but it only saved one CSV file.

pd.concat(df_dict).to_csv('data.csv', index = False, data_format = '%Y-%m-%d)
Himaprasoon
  • 2,609
  • 3
  • 25
  • 46
Peggy
  • 143
  • 3
  • 14
  • 1
    Why did you think that would save more than one file? – jonrsharpe Mar 21 '17 at 05:08
  • 1
    why don't you save your dataframes before putting them into dictionary? `.df{number}.to_csv('{file_name}.csv', index = False, data_format = '%Y-%m-%d)` – cur4so Mar 21 '17 at 05:08

1 Answers1

4

Iterate over the dictionary - using .iloc[] to get the recordID and id values for the name.

df1 = pandas.DataFrame(numpy.random.randn(3, 4), columns=[["day", "number", "id", "recordDay"]])
df2 = pandas.DataFrame(numpy.random.randn(3, 4), columns=[["day", "number", "id", "recordDay"]])
df3 = pandas.DataFrame(numpy.random.randn(3, 4), columns=[["day", "number", "id", "recordDay"]])

df_dict={"data_frame1":df1, "data_frame2": df2, "data_frame3": df3}

for name, df in df_dict.items():
    #get the id and recordDay values from each df
    df_id=df['id'].iloc[0]
    df_record_day=df['recordDay'].iloc[0]

    #generate a unique file name based on the id and record
    file_name="id_"+str(df_id)+"_"+str(df_record_day)+".csv"

    #create the CSV
    df.to_csv(file_name, index = False, data_format = '%Y-%m-%d')

Or you can use a list of arrays instead of a dictionary

df_list=[df1, df2, df3]

for df in df_list:
    #get the id and recordDay values from each df
    df_id=df['id'].iloc[0]
    df_record_day=df['recordDay'].iloc[0]

    #generate a unique file name based on the id and record
    file_name="id_"+str(df_id)+"_"+str(df_record_day)+".csv"

    #create the CSV
    df.to_csv(file_name, index = False, data_format = '%Y-%m-%d')
nanselm2
  • 1,397
  • 10
  • 11
  • Thank you very much!! I convert the `dictionary` into a `list` and run the above code to save the csv files. It works pretty well. Thank you. – Peggy Mar 21 '17 at 14:33
  • Just changed the answer to reflect using a dictionary - I didn't read carefully enough to realize your dfs were already in a dictionary – nanselm2 Mar 21 '17 at 16:47
  • @nanselms2 I just encountered the `TypeError: string indices must be integers` at line `df_id = df['id'].iloc[0]` when using the coding above. – Peggy Mar 23 '17 at 15:14
  • Okay, I'm not sure how your dictionary is set up, if the dataframes themselves are values, then you'll need to use .items() - see http://stackoverflow.com/questions/3294889/iterating-over-dictionaries-using-for-loops-in-python – nanselm2 Mar 23 '17 at 15:41
  • 1
    I changed the answer to reflect both list and dictionary - a list might be easier/make more sense. – nanselm2 Mar 23 '17 at 15:41
  • Thank you very much! Yes, I used `list` and I successfully saved data in csv file. – Peggy Mar 23 '17 at 18:01