Spilt dataframe in pandas

Question

I have a csv file that read using pandas, I' want to split the dataframe in chunks in a specified column:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

list_of_classes=[]
# Reading file
fileName = 'Training.csv'
df       = pd.read_csv(fileName)
classID  = df.iloc[:,-2]
len(classID)
df.iloc[0,-2]
for i in range(len(classID)):
    print(classID[i])
    if classID[i] not in list_of_classes:
        list_of_classes.append(classID[i])


for i in range(len(df)):
  ...............................

UPDATE

Say the dataframe looks like :

........................................
Feature0  Feature1  Feature2  Feature3  ......... classID lastColum 


 190       565     35474  0.336283   2.973684       255         0   
 311       984    113199  0.316057   3.163987       155         0   
 310       984     94197  0.315041   3.174194      1005         0   
 280       984    116359  0.284553   3.514286       255        18   
 249       984    107482  0.253049   3.951807      1005         0   
 283       984    132343  0.287602   3.477032       155         0   
 213       984     88244  0.216463   4.619718       255         0   
 839       984    203139  0.852642   1.172825       255         0   
 376       984    105133  0.382114   2.617021      1005         0   
 324       984    129209  0.329268   3.037037      1005         0

in this example the result that I'm aiming to get, is 3 dataframes, each of them has only 1 classID either 155, 1005, or 255. my question is, is there a finer way to do this ?

Based on what? Where's your code? There's hardly any detail here. Did you look at http://stackoverflow.com/questions/26986372/split-pandas-dataframe-into-separate-pieces-based-on-column-values — EdChum, May 04 '17 at 13:10
@Engine, it's not clear what do you want to "split". Can you post a small reproducible data set and desired data set(s) - this would help us to understand what are you trying to do. [How to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — MaxU - stand with Ukraine, May 04 '17 at 13:31

score 2 · Accepted Answer · answered May 04 '17 at 14:01

Split to 3 separate CSV files:

df.groupby('classID') \
  .apply(lambda x: x.to_csv(r'c:/temp/{}.csv'.format(x.name), index=False))

Generate a dictionary of "splitted" DataFrames:

In [210]: dfs = {g:x for g,x in df.groupby('classID')}

In [211]: dfs.keys()
Out[211]: dict_keys([155, 255, 1005])

In [212]: dfs[155]
Out[212]:
   Feature0  Feature1  Feature2  Feature3  classID  lastColum
1       311       984    113199  0.316057      155          0
5       283       984    132343  0.287602      155          0

In [213]: dfs[255]
Out[213]:
   Feature0  Feature1  Feature2  Feature3  classID  lastColum
0       190       565     35474  0.336283      255          0
3       280       984    116359  0.284553      255         18
6       213       984     88244  0.216463      255          0
7       839       984    203139  0.852642      255          0

In [214]: dfs[1005]
Out[214]:
   Feature0  Feature1  Feature2  Feature3  classID  lastColum
2       310       984     94197  0.315041     1005          0
4       249       984    107482  0.253049     1005          0
8       376       984    105133  0.382114     1005          0
9       324       984    129209  0.329268     1005          0

thanks a lot for your help, how may I avoid this error : FileNotFoundError: [Errno 2] No such file or directory: 'c:/temp/1020.csv' ? — Engine, May 04 '17 at 14:06
do you have such a directory: `c:/temp/`? If not you would need to change it correspondingly... — MaxU - stand with Ukraine, May 04 '17 at 14:12
well the creation I guess of the csv file, I need to check, not on right device now. thanks a lot — Engine, May 04 '17 at 17:18

score 0 · Answer 2 · answered May 04 '17 at 13:22

Here is an example of how you can do it:

import pandas as pd

df = pd.DataFrame({'A': list('abcdef'), 'part': [1, 1, 1, 2, 2, 2]})

parts = df.part.unique()

for part in parts:
    print df.loc[df.part == part]

So the point is that you take all unique parts by calling unique() on series that you want to use for split.

After that, you can access those parts via loop and do whatever you need on each one of them.

Spilt dataframe in pandas

2 Answers2