1
customer    date    x   y   z
1   10/7/2015 0:00  4   4   
1   10/7/2015 1:00  5   9   1
1   10/9/2015 0:00  4   0   3
2   10/7/2015 0:00  8   8   4
2   10/7/2015 1:00  4       5
3   10/7/2015 0:00  1       
3   10/7/2015 1:00  4   0   
3   10/9/2015 0:00  4       0

In the above table, i want to create 3 folders based on the column 'customer' as 1,2,3 and each of the folders should have csv files created based on 'date' column. Note: date column should be grouped based on day and not time. for example, folder 1 should have 2 csv files as 1072015.csv(2 records) and 1092015.csv (1 record)

folder 2 should have 2 csv files with 1 record each.

BENY
  • 317,841
  • 20
  • 164
  • 234
kumarun91
  • 23
  • 3

1 Answers1

1

You can groupby customer, as well as use the dt accessor to groupby date:

g = df.groupby(['customer', df.date.dt.date])

If your date column is not datetime, just use df['date'] = pd.to_datetime(df['date']) first.


Now you can simply loop through each user and date to create your folders and files:

import os

for (user, date), group in g:
    if not os.path.exists(str(user)):
        os.makedirs(str(user))

    fn = date.strftime('%m%d%Y')

    group.iloc[:, 2:].to_csv(f'{user}/{fn}.csv', index=False)

The result looks like this:

test
├── 1
│   ├── 10072015.csv
│   └── 10092015.csv
├── 2
│   └── 10072015.csv
└── 3
    ├── 10072015.csv
    └── 10092015.csv

3 directories, 5 files

Here is an example of the files created:

x,y,z
8,8.0,4.0
4,,5.0
user3483203
  • 50,081
  • 9
  • 65
  • 94