Slicing a dataframe into folders and files based on respective columns

Question

customer    date    x   y   z
1   10/7/2015 0:00  4   4   
1   10/7/2015 1:00  5   9   1
1   10/9/2015 0:00  4   0   3
2   10/7/2015 0:00  8   8   4
2   10/7/2015 1:00  4       5
3   10/7/2015 0:00  1       
3   10/7/2015 1:00  4   0   
3   10/9/2015 0:00  4       0

In the above table, i want to create 3 folders based on the column 'customer' as 1,2,3 and each of the folders should have csv files created based on 'date' column. Note: date column should be grouped based on day and not time. for example, folder 1 should have 2 csv files as 1072015.csv(2 records) and 1092015.csv (1 record)

folder 2 should have 2 csv files with 1 record each.

It would be nice to show your expected output instead of explaining it. Most of us here understand data better than natural language. — cs95, Apr 03 '19 at 21:40
Shouldn't folder 2 have one csv file with 2 records? The date doesn't change — user3483203, Apr 03 '19 at 21:43

score 1 · Answer 1 · answered Apr 03 '19 at 21:56

You can groupby customer, as well as use the dt accessor to groupby date:

g = df.groupby(['customer', df.date.dt.date])

If your date column is not datetime, just use df['date'] = pd.to_datetime(df['date']) first.

Now you can simply loop through each user and date to create your folders and files:

import os

for (user, date), group in g:
    if not os.path.exists(str(user)):
        os.makedirs(str(user))

    fn = date.strftime('%m%d%Y')

    group.iloc[:, 2:].to_csv(f'{user}/{fn}.csv', index=False)

The result looks like this:

test
├── 1
│   ├── 10072015.csv
│   └── 10092015.csv
├── 2
│   └── 10072015.csv
└── 3
    ├── 10072015.csv
    └── 10092015.csv

3 directories, 5 files

Here is an example of the files created:

x,y,z
8,8.0,4.0
4,,5.0

does the last line work well on python 2?? group.iloc[:, 2:].to_csv(f'{user}/{fn}.csv', index=False) — kumarun91, Apr 04 '19 at 17:37
No, it uses `f-strings`. You would need to switch the string formatting. — user3483203, Apr 04 '19 at 17:38

Slicing a dataframe into folders and files based on respective columns

1 Answers1