output multiple files based on column value python pandas

Question

i have a sample pandas data frame:

import pandas as pd

df = {'ID': [73, 68,1,94,42,22, 28,70,47, 46,17, 19, 56, 33 ],
  'CloneID': [1, 1, 1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 4 ],
  'VGene': ['64D', '64D', '64D', 61, 61, 61, 311, 311, 311, 311, 311,  311, 311, 311]}
df = pd.DataFrame(df)

it looks like this:

df
Out[7]: 
    CloneID  ID VGene
0         1  73   64D
1         1  68   64D
2         1   1   64D
3         1  94    61
4         1  42    61
5         2  22    61
6         2  28   311
7         3  70   311
8         3  47   311
9         3  46   311
10        4  17   311
11        4  19   311
12        4  56   311
13        4  33   311

i want to write a simple script to output each cloneID to a different output file. so in this case there would be 4 different files. the first file would be named 'CloneID1.txt' and it would look like this:

CloneID  ID   VGene
     1   73   64D
     1   68   64D
     1   1    64D
     1   94   61
     1   42   61

second file would be named 'CloneID2.txt':

CloneID  ID  VGene
     2   22   61
     2   28   311

third file would be named 'CloneID3.txt':

CloneID  ID  VGene
     3   70   311
     3   47   311
     3   46   311

and last file would be 'CloneID4.txt':

CloneID  ID VGene 
    4    17   311
    4    19   311
    4    56   311
    4    33   311

the code i found online was:

import pandas as pd
data = pd.read_excel('data.xlsx')

for group_name, data in data.groupby('CloneID'):
    with open('results.csv', 'a') as f:
        data.to_csv(f)

but it outputs everything to one file instead of multiple files.

score 5 · Accepted Answer · answered May 13 '16 at 17:51

You can do something like the following:

In [19]:
gp = df.groupby('CloneID')
for g in gp.groups:
    print('CloneID' + str(g) + '.txt')
    print(gp.get_group(g).to_csv())

CloneID1.txt
,CloneID,ID,VGene
0,1,73,64D
1,1,68,64D
2,1,1,64D
3,1,94,61
4,1,42,61

CloneID2.txt
,CloneID,ID,VGene
5,2,22,61
6,2,28,311

CloneID3.txt
,CloneID,ID,VGene
7,3,70,311
8,3,47,311
9,3,46,311

CloneID4.txt
,CloneID,ID,VGene
10,4,17,311
11,4,19,311
12,4,56,311
13,4,33,311

So here we iterate over the groups in for g in gp.groups: and we use this to create the result file path name and call to_csv on the group so the following should work for you:

gp = df.groupby('CloneID')
for g in gp.groups:
    path = 'CloneID' + str(g) + '.txt'
    gp.get_group(g).to_csv(path)

Actually the following would be even simpler:

gp = df.groupby('CloneID')
gp.apply(lambda x: x.to_csv('CloneID' + str(x.name) + '.txt'))

How fast would this be if we had a file of 1 gb? – Angelo Jan 31 '19 at 22:32 — Angelo, Jan 31 '19 at 22:32

output multiple files based on column value python pandas

1 Answers1

Linked