Split Multilevel dataframe into different csv files

Question

Suppose I have the following dataframe :

    X       Y
   ---+---+---+---
    A | B | A | B
--+---+---+---+---
0 | 1 | 2 | 3 | 4
1 | 5 | 6 | 7 | 8
2 | 9 | 10| 11| 12

I want to split it based on the multilevel index recursively and save them in a csv file.

For example a file name X_A.csv should contain the following dataframe:

    X 
   ---
    A 
--+---
0 | 1 
1 | 5 
2 | 9

Similarly the file X_B.csv should store a dataframe as :

    X 
   ---
    B 
--+---
0 | 2 
1 | 6 
2 | 10

and so on for Y_A and Y_B.

I am looking for a Pythonic ( or efficient) way to do this rather than iterating over the column values separately as the code is quite large. I tried using the techniques mentioned here by dropping the column levels and storing the individual columns but I want to it in such a way that I don't have to explicitly mention the column names since the dataframe may expand ( i.e. at the top most level there might be 4 columns say W, X, Y and Z).

If panda.Dataframe subclass numpy.Ndarray, can't you use [numpy indexing](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#arrays-indexing) on it to select the columns you're interested in, and then use the object's method to exoprt to CSV — Nicolas David, Sep 11 '17 at 20:16
@Wen I do not want to change the original df that is why I don't want to combine the two level column names. — Gambit1614, Sep 12 '17 at 02:28

score 4 · Accepted Answer · edited Sep 11 '17 at 19:53

4

list_of_df = [df[i].to_frame() for i in df.columns]

With @JohnGalt's suggestion to csv:

_ = [df[i].to_frame().to_csv('{0}_{1}'.format(*i)) for i in df.columns]

Output:

list_of_df[0]

list_of_df[1]

...

list_of_df[3]

edited Sep 11 '17 at 19:53

Zero

74,117
18
147
154

answered Sep 11 '17 at 19:45

Scott Boston

147,308
15
139
187

Neat and `...to_csv('{0}_{1}.csv'.format(*i))` for csv. – Zero Sep 11 '17 at 19:48

cs95 · Answer 2 · 2017-09-11T20:06:15.957

2

Option 1

Rename df.columns

df.columns = ['_'.join(x) for x in df.columns]

df

   X_A  X_B  Y_A  Y_B
0                    
0    1    2    3    4
1    5    6    7    8
2    9   10   11   12

for c in df:
     df[c].reset_index().to_csv(c + '.csv')

Option 2

groupby levels

for name, g in df.groupby(level=[0, 1], axis=1):
    g.to_csv('{0}_{1}.csv'.format(*name))

edited Sep 11 '17 at 20:06

answered Sep 11 '17 at 19:40

cs95

379,657
97
704
746

score 2 · Answer 3 · answered Sep 11 '17 at 20:04

You can use a simple apply for that.

Create example df:

import pandas as pd
import numpy as np

index = pd.MultiIndex(levels=[['X', 'Y'], ['A', 'B']],labels=[[0, 0, 1, 1], [0, 1, 0, 1]])
df = pd.DataFrame(columns=index, data=np.arange(12).reshape((3, 4)))

>>> df
Out[]:
   X      Y
   A  B   A   B
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11

Save as separate CSV files:

df.apply(lambda x: x.to_csv('_'.join(x.name) + '.csv'), axis=0)

This will produce the desired output. For example X_A.csv contains:

0,0
1,4
2,8

If you want to include the axis name into the csv, add header=True as argument to to_csv, then X_A.csv will contain:

,X
,A
0,0
1,4
2,8

Split Multilevel dataframe into different csv files

3 Answers3

Create example df:

Save as separate CSV files: