1

Suppose I have the following dataframe :

    X       Y
   ---+---+---+---
    A | B | A | B
--+---+---+---+---
0 | 1 | 2 | 3 | 4
1 | 5 | 6 | 7 | 8
2 | 9 | 10| 11| 12

I want to split it based on the multilevel index recursively and save them in a csv file.

For example a file name X_A.csv should contain the following dataframe:

    X 
   ---
    A 
--+---
0 | 1 
1 | 5 
2 | 9 

Similarly the file X_B.csv should store a dataframe as :

    X 
   ---
    B 
--+---
0 | 2 
1 | 6 
2 | 10

and so on for Y_A and Y_B.

I am looking for a Pythonic ( or efficient) way to do this rather than iterating over the column values separately as the code is quite large. I tried using the techniques mentioned here by dropping the column levels and storing the individual columns but I want to it in such a way that I don't have to explicitly mention the column names since the dataframe may expand ( i.e. at the top most level there might be 4 columns say W, X, Y and Z).

cs95
  • 379,657
  • 97
  • 704
  • 746
Gambit1614
  • 8,547
  • 1
  • 25
  • 51
  • Why not combine two level column names to one – BENY Sep 11 '17 at 19:44
  • If panda.Dataframe subclass numpy.Ndarray, can't you use [numpy indexing](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#arrays-indexing) on it to select the columns you're interested in, and then use the object's method to exoprt to CSV – Nicolas David Sep 11 '17 at 20:16
  • @Wen I do not want to change the original df that is why I don't want to combine the two level column names. – Gambit1614 Sep 12 '17 at 02:28

3 Answers3

4
list_of_df = [df[i].to_frame() for i in df.columns]

With @JohnGalt's suggestion to csv:

_ = [df[i].to_frame().to_csv('{0}_{1}'.format(*i)) for i in df.columns]

Output:

list_of_df[0]

   X
   A
0  1
1  5
2  9

list_of_df[1]

    X
    B
0   2
1   6
2  10

...

list_of_df[3]

    Y
    B
0   4
1   8
2  12
Zero
  • 74,117
  • 18
  • 147
  • 154
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
2

Option 1

Rename df.columns

df.columns = ['_'.join(x) for x in df.columns]

df

   X_A  X_B  Y_A  Y_B
0                    
0    1    2    3    4
1    5    6    7    8
2    9   10   11   12

for c in df:
     df[c].reset_index().to_csv(c + '.csv')

Option 2

groupby levels

for name, g in df.groupby(level=[0, 1], axis=1):
    g.to_csv('{0}_{1}.csv'.format(*name))
cs95
  • 379,657
  • 97
  • 704
  • 746
2

You can use a simple apply for that.

Create example df:

import pandas as pd
import numpy as np

index = pd.MultiIndex(levels=[['X', 'Y'], ['A', 'B']],labels=[[0, 0, 1, 1], [0, 1, 0, 1]])
df = pd.DataFrame(columns=index, data=np.arange(12).reshape((3, 4)))

>>> df
Out[]:
   X      Y
   A  B   A   B
0  0  1   2   3
1  4  5   6   7
2  8  9  10  11

Save as separate CSV files:

df.apply(lambda x: x.to_csv('_'.join(x.name) + '.csv'), axis=0)

This will produce the desired output. For example X_A.csv contains:

0,0
1,4
2,8

If you want to include the axis name into the csv, add header=True as argument to to_csv, then X_A.csv will contain:

,X
,A
0,0
1,4
2,8
FabienP
  • 3,018
  • 1
  • 20
  • 25