1

I am novice using VBA to organize some data in an Excel sheet. I also have some experience in Python if that is easier.

I have a .csv file from a model that outputs a discontinuous time series (whenever there is inflow but not every 1-min time step):

Date/Time Drainage cm/BRA
5/10/14 11:00 0
5/10/14 11:01 1
5/10/14 11:02 2
5/10/14 11:03 2
5/10/14 11:04 1
5/11/14 7:00 1
5/11/14 7:01 0
5/11/14 7:02 1
5/11/14 7:03 1
5/11/14 7:04 0
5/11/14 7:05 0

What I want is to be able to aggregate the data to 2-min time steps. So that I get this:

Date/Time Drainage cm/BRA
5/10/14 11:00 1
5/10/14 11:02 4
5/10/14 11:04 1
5/10/14 11:06 0
5/10/14 11:08 0
...

...
5/11/14 6:58 0
5/11/14 7:00 1
5/11/14 7:02 2
5/11/14 7:04 0

I already have a code that will aggregate the data for me as long as I have a continuous data file (in this example filling in the gaps between 5/10/14 11:04 to 5/11/14 7:00). As you can imagine, with minute data, there is too much to manually fill the gaps not to mention that I have to reproduce this for over 70 output files.

Any suggestions for creating a continuous time series or aggregating to 2-min despite uneven gaps in the time series??

S3DEV
  • 8,768
  • 3
  • 31
  • 42
wlisenb
  • 11
  • 2
  • You might try curve fitting algorithms. – SmileyFtW Feb 15 '20 at 21:25
  • Using Pandas seems like the logical (easiest route) for this. Check out [this question](https://stackoverflow.com/questions/32147224/how-to-groupby-time-series-by-10-minutes-using-pandas); might get you started in the right direction. First, read your data into a [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html), using the [pandas.DataFrame.read_excel()](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html) function. – S3DEV Feb 15 '20 at 21:43
  • @braX I tried summing every two columns to aggregate but that doesn't work when skipping steps (for instance 5/10/14 11:00 value and 5/11/14 7:00 value would be added as the total for the 2-min timestep 11:00-11:02) . Next, I tried to use an if statement to verify that the previous cell was in fact only a 2-min difference but when I do that it outputs correctly for 11:02 but skips 11:04 (because the next cell 7:00 is not a 2-min step) and then starts on 7:01 because that is 2 cells below 11:04 . This won't work because now all the timesteps end in odd numbers instead of even. – wlisenb Feb 15 '20 at 22:34
  • Thanks @S3DEV. I will look into it. – wlisenb Feb 15 '20 at 22:36
  • What should the value be for the interpolated times? Should the missing values be 0? – S3DEV Feb 16 '20 at 12:09
  • Yes any time not listed between values is zero – wlisenb Feb 16 '20 at 13:10

1 Answers1

1

Give this solution a try. It employs pandas and the following techniques:

  • Reading a CSV file into a DataFrame
  • Combining column data
  • Converting a date/time string to a datetime datatype
  • Resampling the datetime object to 2 minute intervals

Data Quality Issues:

There are a couple data quality issues in your CSV where:

  • The data is not actually comma separated
  • There are spaces in the column names
  • There are two columns, however three spaces in the data (date, time, drainage)

Give the above, there is a section of the code which addresses these issues.

Sample Code:

import pandas as pd

# Read Excel file into DataFrame.
df = pd.read_csv('./drainage.csv', sep=' ')

# Address data quality issues:
# Combine date and time columns.
df['Date/Time Drainage'] = df['Date/Time'] + ' ' + df['Drainage']
# Drop unneeded columns.
df.drop(['Date/Time', 'Drainage'], axis=1, inplace=True)
# Ensure Date/Time column is a datetime datatype.
df['Date/Time Drainage'] = pd.to_datetime(df['Date/Time Drainage'], format='%m/%d/%y %H:%M')

# Resample to 2 second intervals and sum results.
df = df.set_index('Date/Time Drainage').resample('2T').sum().reset_index()

# Show results.
print(df)

Output:

     Date/Time Drainage  cm/BRA
0   2014-05-10 11:00:00       1
1   2014-05-10 11:02:00       4
2   2014-05-10 11:04:00       1
3   2014-05-10 11:06:00       0
4   2014-05-10 11:08:00       0
5   2014-05-10 11:10:00       0
...
597 2014-05-11 06:54:00       0
598 2014-05-11 06:56:00       0
599 2014-05-11 06:58:00       0
600 2014-05-11 07:00:00       1
601 2014-05-11 07:02:00       2
602 2014-05-11 07:04:00       0
S3DEV
  • 8,768
  • 3
  • 31
  • 42