0

I am running a txt. file through python and trying to resample the data into daily averages. The txt. file contains text before the actual data set, so I read the file then deleted the first rows. The entire data set has ~3 million rows.

Sample data looks like this:

**There are 20 lines of text before the data           

     Date      Time             Time           Pressure     Temperature
   M/d/yyyy  HH:mm:ss            hr              psi            degF
  
  10/23/2019  16:39:25        0.000000         13.3120        106.0533
  10/23/2019  16:39:26        0.001389         13.2860        106.0155
  10/23/2019  16:39:27        0.002778         13.2952        105.9848
  10/23/2019  16:39:28        0.004167         13.2788        105.9535
  10/23/2019  16:39:29        0.005556         13.2704        105.9228
  10/23/2019  16:39:30        0.006944         13.2946        105.8922
  10/23/2019  16:39:31        0.008333         13.2684        105.8620

My code looks like this:

import pandas as pd

df = pd.read_csv('well_pressures.txt', header=None)
df.drop(df.head(20).index, inplace=True)
df.columns=['test']
df = df.test.str.split(expand=True)
df.rename(columns ={0:'Date', 1:'Time', 2:'Time2', 3:'Pressure', 4:'Temperature'},inplace=True)
df.drop(columns={'Time','Time2'}, inplace=True)

wellname = 'well 4' 
df['Wellbore'] = wellname 

df.set_index('Wellbore', inplace=True)
df['Date'] = pd.to_datetime(df['Date'])
df = df.resample('D').mean()

When I run the code, I get the error "TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'". I'm not sure how to fix this, even after resetting the index. When I set the date column as the index and try resampling, I get a key error. The only way I can get this to work is if I index the date column when reading the file, and deleting the unnecessary first few rows BEFORE importing the file (but I'd rather not have to do it this way).

Expected output:

 Wellbore        Date       Pressure     Temperature
  Well 4      10/23/2019      13.34        106.1
  Well 4      10/24/2019      13.20        106.2
  Well 4      10/25/2019      13.36        105.9
  Well 4      10/26/2019      13.29        105.9
  Well 4      10/27/2019      13.27        105.8

Any suggestions? Thanks!

kn2298
  • 49
  • 5

1 Answers1

1

I think you are looking for pd.Grouper. Also, there are a load of parameters with read_csv that you can ultilize to save a lot of code:

df = (pd.read_csv('well_pressures.txt', sep='', usecols = [0,3,4], skiprows=20,
                 names=['Date', 'Pressure', 'Temperature'], parse_dates=True)
      .assign(Wellbore='well 4')
      .set_index('Date'))
df = df.groupby(pd.Grouper(freq='1D')).mean() #Also, you can try df = df.resample('D').mean() but I think you are looking to groupby day.
df
David Erickson
  • 16,433
  • 2
  • 19
  • 35