1

I've researched this question heavily for the past few days and I still cannot find suggestions to my problem.

Below is an example of my dataframe titled 'dfs'. There are around 80 columns, only 4 shown in the below example.

dfs is a large dataframe consisting of rows of data reported every 15 minutes for over 12 months (i.e. 2015-08-01 00:00:00 to 2016-09-30 23:45:00). The Datetime column is in the format datetime.

enter image description here

...

enter image description here enter image description here

...

enter image description here

I want to export (or write) multiple monthly csv files, which are snippets of monthly data taken from the original large csv file (dfs). For each month, I want a file to be written that contains the the raw data, day data (6am-6pm) and night data (6pm-6am). I also want the name of each monthly file to be automated so it knows whether to call itself dfs_%Y%m, or dfs_day_%Y%m, or dfs_night_%Y%m depending on the data it contains.

At the moment I am writing out over 180 lines of code to export each csv file.

For example:

I create monthly raw, day and night files by grabbing the data between the datetimes listed below from the index Datetime column

dfs201508 = dfs.ix['2015-08-01 00:00:00':'2015-08-31 23:45:00']

dfs201508Day = dfsDay.ix['2015-08-01 00:00:00':'2015-08-31 23:45:00']

dfs201508Night = dfsNight.ix['2015-08-01 00:00:00':'2015-08-31 23:45:00']

Then I export these files to their respective outputpaths and give them a filename

dfs201508 = dfs201508.to_csv(outputpath+"dfs201508.csv")

dfs201508Day = dfs201508Day.to_csv(outputpathDay+"dfs_day_201508.csv")

dfs201508Night = dfs201508Night.to_csv(outputpathNight+"dfs_night_201508.csv")

What I want to write is something like this

dfs_%Y%m = dfs.ix["%Y%m"]

dfs_day_%Y%m = dfs.ix["%Y%m(between 6am-6pm)"]

dfs_night_%Y%m = dfs.ix["%Y%m(between 6pm-6am)"]

dfs_%Y%m = dfs_%Y%m.to_csv(outputpath +"dfs_%Y%m.csv")

dfs_day_%Y%m = dfs_day_%Y%m.to_csv(outputpath%day +"dfs_day_%Y%m.csv")

dfs_night_%Y%m = dfs_night_%Y%m.to_csv(outputpath%night +"dfs_night_%Y%m.csv")

Any suggestions on the code to automate this process would be greatly appreciated.

Here are some links to pages I researched:

https://www.youtube.com/watch?v=aeZKJGEfD7U

Writing multiple Python dictionaries to csv file

Open a file name +date as csv in Python

Community
  • 1
  • 1
Monte Cristo
  • 131
  • 1
  • 12

1 Answers1

1

You can use a for loop to iterate over the years and months contained within dfs. I created a dummy dataframe called DF in the below example, which contains just three sample columns:

dates               Egen1_kwh   Egen2_kwh
2016-01-01 00:00:00 15895880    15877364
2016-01-01 00:15:00 15895880    15877364
2016-01-01 00:30:00 15895880    15877364
2016-01-01 00:45:00 15895880    15877364
2016-01-01 01:00:00 15895880    15877364

The below code filters the main dataframe DF into smaller dataframes (NIGHT and DAY) for each month within each year and saves them to as .csv with a name corresponding to their date (e.g. 2016_1_DAY and 2016_1_NIGHT for Jan 2016 Day and Jan 2016 Night).

import pandas as pd
import datetime
from dateutil.relativedelta import relativedelta
from random import randint

# I defined a sample dataframe with dummy data
start = datetime.datetime(2016,1,1,0,0)
dates = [start + relativedelta(minutes=15*i) for i in range(0,10000)]
Egen1_kwh = randint(15860938,15898938)
Egen2_kwh = randint(15860938,15898938)

DF = pd.DataFrame({
        'dates': dates,
        'Egen1_kwh': Egen1_kwh,
        'Egen2_kwh': Egen2_kwh,
    })


# define when day starts and ends (MUST USE 24 CLOCK)
day = {
        'start': datetime.time(6,0),  # start at 6am (6:00)
        'end': datetime.time(18,0)  # ends at 6pm (18:00)
      }


# capture years that appear in dataframe
min_year = DF.dates.min().year
max_year = DF.dates.max().year

if min_year == max_year:
    yearRange = [min_year]
else:
    yearRange = range(min_year, max_year+1)

# iterate over each year and each month within each year
for year in yearRange:
    for month in range(1,13):

        # filter to show NIGHT and DAY dataframe for given month within given year
        NIGHT = DF[(DF.dates >= datetime.datetime(year, month, 1)) & 
           (DF.dates <= datetime.datetime(year, month, 1) + relativedelta(months=1) - relativedelta(days=1)) & 
           ((DF.dates.apply(lambda x: x.time()) <= day['start']) | (DF.dates.apply(lambda x: x.time()) >= day['end']))]

        DAY = DF[(DF.dates >= datetime.datetime(year, month, 1)) & 
           (DF.dates <= datetime.datetime(year, month, 1) + relativedelta(months=1) - relativedelta(days=1)) & 
           ((DF.dates.apply(lambda x: x.time()) > day['start']) & (DF.dates.apply(lambda x: x.time()) < day['end']))]

        # save to .csv with date and time in file name
        # specify the save path of your choice
        path_night = 'C:\\Users\\nickb\\Desktop\\stackoverflow\\{0}_{1}_NIGHT.csv'.format(year, month)
        path_day = 'C:\\Users\\nickb\\Desktop\\stackoverflow\\{0}_{1}_DAY.csv'.format(year, month)

        # some of the above NIGHT / DAY filtering will return no rows.
        # Check for this, and only save if the dataframe contains rows
        if NIGHT.shape[0] > 0:
            NIGHT.to_csv(path_night, index=False)
        if DAY.shape[0] > 0:
            DAY.to_csv(path_day, index=False)
NickBraunagel
  • 1,559
  • 1
  • 16
  • 30
  • This worked great. Thank you. Question: In the output path script, can I change ".format(year, month)" somehow so that each month in the file name is seen as 0X, not just X? So January is 01, not 1 and so on... October would remain as 10 and so on for Nov and Dec. – Monte Cristo Nov 21 '16 at 04:10
  • Yeah, you can: check the length of the number and add a zero to the beginning of it if the length is 1. Example: `if len(str(month)) == 1` then `.format(year, "0" + month)` else `format(year, month)`. – NickBraunagel Nov 21 '16 at 19:22
  • Thanks again, @NickBraunagel. You've been a massive help. I don't want you thinking I post questions without researching them first, but here is another that I can't find any answers to: http://stackoverflow.com/questions/40738857/for-loop-across-multiple-folders-to-resample-datetime-in-multiple-csv-files-and – Monte Cristo Nov 22 '16 at 10:04
  • Haha, no problem. I'm on vacation at the moment so it might be a few days before I can respond. – NickBraunagel Nov 22 '16 at 10:07
  • I am really struggling here. I've been googling for well over a week now and no one can answer my question. I can create directories that include all my csv files, which are located in sub-folders of folders, but I cannot manipulate all files at once (resampling for example) and then export all resampled csv files at once with separate names generated automatically. If you have time to help, my question is here: http://stackoverflow.com/questions/40782151/resampling-multiple-csv-files-and-automatically-saving-resampled-files-with-new Any suggestions would be much appreciated. – Monte Cristo Nov 29 '16 at 06:47
  • Ok, I'll take a look when I come back from vacation (in two days) – NickBraunagel Nov 29 '16 at 13:09
  • Cheers mate. Enjoy your Vacay! – Monte Cristo Nov 30 '16 at 00:54