2

I'm currently dealing with a set of similar DataFrames having a double Header. They have the following structure:

   age height weight shoe_size
   RHS height weight shoe_size
0  8.0    6.0    2.0       1.0
1  8.0    NaN    2.0       1.0
2  6.0    1.0    4.0       NaN
3  5.0    1.0    NaN       0.0
4  5.0    NaN    1.0       NaN
5  3.0    0.0    1.0       0.0

   height  weight shoe_size   age
      RHS  weight shoe_size   age
0     1.0    1.0        NaN   NaN
1     1.0    2.0        0.0   2.0
2     1.0    NaN        0.0   5.0
3     1.0    2.0        0.0   NaN
4     0.0    1.0        0.0   3.0

Actually the main differences are the sorting of the first Header row, which could be made the same for all of them, and the position of the RHS header column in the second Header row. I'm currently wondering if there is an easy way of saving/reading all these DataFrames into/from a single CSV file instead of having a different CSV file for each of them.

cs95
  • 379,657
  • 97
  • 704
  • 746
Juan Carlos
  • 367
  • 2
  • 8
  • 16
  • 1
    Does excel work? You can save each df on a separate sheet. – cs95 Aug 24 '17 at 11:24
  • It would be better using just a plain text CSV file, but excel would be ok too. – Juan Carlos Aug 24 '17 at 11:32
  • What is your question? It sounds like the issue you're encountering relates to the headers being inconsistent but then you say "which could be made the same for all of them"... If you're asking "can pandas read/write to a single csv or multiple csvs" the answer is yes. Which direction to go in depends on the scenario. – Andrew L Aug 24 '17 at 11:33
  • My question is if there exists a way of saving and then reading multiple DataSets with the above structure (slightly different) into/from a single CSV file. – Juan Carlos Aug 24 '17 at 11:36

2 Answers2

5

Unfortunately, there isn't any reasonable way to store multiple dataframes in a single CSV such that retrieving each one would not be excessively cumbersome, but you can use pd.ExcelWriter and save to separate sheets in a single .xlsx file:

import pandas as pd

writer = pd.ExcelWriter('file.xlsx')
for i, df in enumerate(df_list):
    df.to_excel(writer,'sheet{}'.format(i))
writer.save() 
cs95
  • 379,657
  • 97
  • 704
  • 746
0

Taking back your example (with random numbers instead of your values) :

import pandas as pd
import numpy as np

h1 = [['age', 'height', 'weight', 'shoe_size'],['RHS','height','weight','shoe_size']]
df1 = pd.DataFrame(np.random.randn(3, 4), columns=h1)

h2 = [['height', 'weight', 'shoe_size','age'],['RHS','weight','shoe_size','age']]
df2 = pd.DataFrame(np.random.randn(3, 4), columns=h2)

First, reorder your columns (How to change the order of DataFrame columns?) :

df3 = df2[h1[0]]

Then, concatenate the two dataframes (Merge, join, and concatenate) :

df4 = pd.concat([df1,df3])

I don't know how you want to deal with the second row of your header (for now, it's just using two sub-columns, which is not very elegant). If, to your point of view, this row is meaningless, just reset your header like you want before to concatenate :

df1.columns=h1[0]
df3.columns=h1[0]

df5 = pd.concat([df1,df3])

Finally, save it under CSV format (pandas.DataFrame.to_csv) :

df4.to_csv('file_name.csv',sep=',')
Achille Salaün
  • 134
  • 1
  • 8