I have the below data set which is a reading of values every 5 seconds. I need to do two operations on the data set.
- Calculate average value for every minute from the data set
- Using the above minute average values, calculate hourly variation (i.e difference every minute values and sum total)
What would be the best way to achieve this?
2018-02-10 17:25:49.074206,340
2018-02-10 17:25:54.078155,340
2018-02-10 17:25:59.081041,340
2018-02-10 17:26:04.085504,340
2018-02-10 17:26:09.089500,340
2018-02-10 17:26:14.092926,340
2018-02-10 17:26:19.097002,340
2018-02-10 17:26:24.101067,340
2018-02-10 17:26:29.104451,340
2018-02-10 17:26:34.108283,340
2018-02-10 17:26:39.112641,340
2018-02-10 17:26:44.115325,340
2018-02-10 17:26:49.120067,340
2018-02-10 17:26:54.124166,340
2018-02-10 17:26:59.127224,340
I have looked at various posts of stackoverflow have the below less than optimal code, still norm_by_data1 has errors
import pandas as pd
from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot
def parser(x):
return datetime.strptime(x, '%Y-%m-%d %H:%M:%S.%f').strftime('%Y-%m-%d %H:%M')
def parser1(x):
return datetime.strptime(x, '%Y-%m-%d %H:%M:%S').strftime('%Y-%m-%d %H')
def norm_by_data(x):
return x.mean()
prevrow = None
total = None
def norm_by_data1(x):
for row in x:
total += row - prevrow
prevrow = row
series = read_csv('water_data.txt', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
#print(series.head())
series.groupby(level=0).apply(norm_by_data).to_csv("tmp")
series1 = read_csv('tmp', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser1)
series1.groupby(level=0).apply(norm_by_data1)