I know pandas is extremely intimidating when you're just learning Python but trust me, it is the way to go instead of using the csv
module. You can do a lot in a few lines of code vs csv
module and for
loops and defining variables manually.
Grayed out below is an example of how you'd read your data in, check datatypes, resample the data, and write to csv file.
The biggest problem you might run into is getting your datatypes (dtypes
) set properly. For example, if you read your data in and check the datatypes, you might see this:
df.dtypes
Index object
A float64
B float64
C float64
D float64
dtype: object
You first need the Index
in a datetime datatype. To do this, do the following:
df['Index'] = pd.to_datetime(df['Index'])
Then check your datatypes again to confirm you've converted Index
to a datetime datatype:
Index datetime64[ns]
A float64
B float64
C float64
D float64
dtype: object
In order to resample
in pandas, your index needs to be a DatetimeIndex
. To set the index in a dataframe, use:
df = df.set_index('Index')
If your datatypes are now correct, you can perform the resample
.
import pandas as pd
import numpy as np
#UNCOMMENT THE CODE PARTS BELOW IF DESIRED
## cp1252 encoding works best on my windows machine
#df = pd.read_csv('convertcsv.csv', encoding='cp1252')
## check datatypes to make sure they are not 'object' when it should be 'float64' or 'int64' for example
#print(df.dtypes)
## you want to group by hour and find the average (aka: mean) which is where resample comes in
## the 'H' means 'hours' and how='mean' is telling it what to do with the data after it groups by hour
#df = df.resample('H', how='mean')
## you want to write test1.csv . If you don't want the index, set index=False
#df.to_csv('test1.csv', index=False)
#Example
index = pd.date_range('1/1/2015', periods=6*60*3, freq='10S')
data = abs(np.random.randn(6*60*3, 4))
df = pd.DataFrame(data=data, index=index, columns=list('ABCD'))
df = df.resample('H', how='mean')
print(df)