Bin values based on ranges with pandas

Question

I have multiple CSV files with values like this in a folder:

The GroupID.csv is the filename. There are multiple files like this, but the value ranges are defined in the same XML file. I'm trying to group them How can I do that?

UPDATE1: Based on BobHaffner's comments, I've done this

import pandas as pd 
import glob path =r'path/to/files' 
allFiles = glob.glob(path + "/*.csv")
frame = pd.DataFrame()
list_ = []
for file_ in allFiles:
    df = pd.read_csv(file_,index_col=None, header=None)
    df['file'] = os.path.basename('path/to/files/'+file_)
    list_.append(df)
frame = pd.concat(list_)
print frame

to get something like this:

I need to group the values based on the bins from the XML file. I'd truly appreciate any help.

There are a few questions here. I would suggest trying to get the csvs into one dataframe as a good first step. Check out the glob module and pandas read_csv() and concat() — Bob Haffner, Jul 31 '15 at 01:33
@BobHaffner I can do that. On concatenating, it'd lose track of the file name. Wouldn't it? — pam, Jul 31 '15 at 03:13
You could create an additional field in each dataframe with the file name. That would work, right? — Bob Haffner, Jul 31 '15 at 03:21
Haven't added the file name yet, but without it, I've done this `import pandas as pd import glob path =r'path/to/file' allFiles = glob.glob(path + "/*.csv") frame = pd.DataFrame() list_ = [] for file_ in allFiles: df = pd.read_csv(file_,index_col=None, header=None) list_.append(df) frame = pd.concat(list_) print frame` to get something like this: `0 1 0 person1 24 1 person2 17 2 person3 98 3 person4 6 0 person2 166 1 person3 125 2 person5 172` — pam, Jul 31 '15 at 03:27
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/84756/discussion-between-bob-haffner-and-pam). — Bob Haffner, Jul 31 '15 at 03:31

score 60 · Accepted Answer · answered Jul 31 '15 at 07:50

In order to bucket your series, you should use the pd.cut() function, like this:

df['bin'] = pd.cut(df['1'], [0, 50, 100,200])

         0    1        file         bin
0  person1   24     age.csv     (0, 50]
1  person2   17     age.csv     (0, 50]
2  person3   98     age.csv   (50, 100]
3  person4    6     age.csv     (0, 50]
4  person2  166  Height.csv  (100, 200]
5  person3  125  Height.csv  (100, 200]
6  person5  172  Height.csv  (100, 200]

If you want to name the bins yourself, you can use the labels= argument, like this:

df['bin'] = pd.cut(df['1'], [0, 50, 100,200], labels=['0-50', '50-100', '100-200'])

         0    1        file      bin
0  person1   24     age.csv     0-50
1  person2   17     age.csv     0-50
2  person3   98     age.csv   50-100
3  person4    6     age.csv     0-50
4  person2  166  Height.csv  100-200
5  person3  125  Height.csv  100-200
6  person5  172  Height.csv  100-200

Bin values based on ranges with pandas

1 Answers1

Linked

Related