I am running a txt. file through python and trying to resample the data into daily averages. The txt. file contains text before the actual data set, so I read the file then deleted the first rows. The entire data set has ~3 million rows.
Sample data looks like this:
**There are 20 lines of text before the data
Date Time Time Pressure Temperature
M/d/yyyy HH:mm:ss hr psi degF
10/23/2019 16:39:25 0.000000 13.3120 106.0533
10/23/2019 16:39:26 0.001389 13.2860 106.0155
10/23/2019 16:39:27 0.002778 13.2952 105.9848
10/23/2019 16:39:28 0.004167 13.2788 105.9535
10/23/2019 16:39:29 0.005556 13.2704 105.9228
10/23/2019 16:39:30 0.006944 13.2946 105.8922
10/23/2019 16:39:31 0.008333 13.2684 105.8620
My code looks like this:
import pandas as pd
df = pd.read_csv('well_pressures.txt', header=None)
df.drop(df.head(20).index, inplace=True)
df.columns=['test']
df = df.test.str.split(expand=True)
df.rename(columns ={0:'Date', 1:'Time', 2:'Time2', 3:'Pressure', 4:'Temperature'},inplace=True)
df.drop(columns={'Time','Time2'}, inplace=True)
wellname = 'well 4'
df['Wellbore'] = wellname
df.set_index('Wellbore', inplace=True)
df['Date'] = pd.to_datetime(df['Date'])
df = df.resample('D').mean()
When I run the code, I get the error "TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'". I'm not sure how to fix this, even after resetting the index. When I set the date column as the index and try resampling, I get a key error. The only way I can get this to work is if I index the date column when reading the file, and deleting the unnecessary first few rows BEFORE importing the file (but I'd rather not have to do it this way).
Expected output:
Wellbore Date Pressure Temperature
Well 4 10/23/2019 13.34 106.1
Well 4 10/24/2019 13.20 106.2
Well 4 10/25/2019 13.36 105.9
Well 4 10/26/2019 13.29 105.9
Well 4 10/27/2019 13.27 105.8
Any suggestions? Thanks!