0

I'm attempting to create a 10 minute running average of wind speeds in order to plot on a graph. My data has numerous uneven time steps within it. I am currently using the CSV module to read in and gather my data, I am not very familiar with pandas and have had issues in the past with it.

import matplotlib.pyplot as plt
import csv
from datetime import datetime

x=[]
y=[]

with open('KART_201901010000_201912310000.txt') as csvfile:
    plots = csv.reader(csvfile, delimiter=',')
    for row in plots:
        if 'M' == row[1]:
            continue
        else:           
            x.append(datetime.strptime(row[0],'%Y-%m-%d %H:%M'))            
            y.append(int(row[1]))   

plt.plot(x,y, label='Wind Speed')
plt.xlabel('Date and Time')
plt.ylabel('Wind Speed (Kts)')
plt.title('Wind Speed\nVersus Time')
plt.legend()
plt.show()

Here is a snippet of my data set showing one of the many uneven time steps.

2019-11-01 11:40,30
2019-11-01 11:45,35
2019-11-01 11:50,32
2019-11-01 11:55,34
2019-11-01 11:56,33
2019-11-01 12:00,33
2019-11-01 12:05,36
2019-11-01 12:10,31

The obvious general idea is to use a for loop to continue the calculations that I would need to average the data. The issue I am running into is how do I account for the uneven steps? Is there a way to use datetime to achieve this that I have no idea about?

Liam Healy
  • 13
  • 2
  • Take a look at this: [Pandas rolling mean by time interval](https://stackoverflow.com/questions/15771472/pandas-rolling-mean-by-time-interval) – xletmjm Feb 19 '20 at 23:49

1 Answers1

0

Something along the lines of:

import pandas as pd
df = pd.read_csv('KART_201901010000_201912310000.txt', header=1)
df.index = pd.to_datetime(df.index, format='%Y-%m-%d %H:%M')
df.rolling('10min', min_periods=1).mean()

I haven't tested it, details might differ. I know you are not familiar with pandas but implementing this feature on your own will take precious time I'm sure you'd gladly invest elsewhere.

This definitely works for the data you provided:

>>> series = pd.Series([30, 35, 32, 34, 33, 33, 36, 31],
                    index=[pd.Timestamp('2019-11-01 11:40'),
                        pd.Timestamp('2019-11-01 11:45'),
                        pd.Timestamp('2019-11-01 11:50'),
                        pd.Timestamp('2019-11-01 11:55'),
                        pd.Timestamp('2019-11-01 11:56'),
                        pd.Timestamp('2019-11-01 12:00'),
                        pd.Timestamp('2019-11-01 12:05'),
                        pd.Timestamp('2019-11-01 12:10')])
>>> series.rolling('10min', min_periods=1).mean()
Out:
2019-11-01 11:40:00    30.000000
2019-11-01 11:45:00    32.500000
2019-11-01 11:50:00    33.500000
2019-11-01 11:55:00    33.000000
2019-11-01 11:56:00    33.000000
2019-11-01 12:00:00    33.333333
2019-11-01 12:05:00    34.000000
2019-11-01 12:10:00    33.500000
dtype: float64
xletmjm
  • 257
  • 1
  • 3