0

I have a df, self.meter_readings, where the index is datetime values and there is a column of numbers, as below:

self.meter_readings['PointProduction']
2012-03     7707.443
2012-04     9595.481
2012-05     5923.493
2012-06     4813.446
2012-07     5384.159
2012-08     4108.496
2012-09     6370.271
2012-10     8829.357
2012-11     7495.700
2012-12    13709.940
2013-01     6148.129
2013-02     7249.951
2013-03     6546.819
2013-04     7290.730
2013-05     5056.485
Freq: M, Name: PointProduction, dtype: float64

I want to get the gradient of PointProduction against time. i.e. y=PointProduction x=time. I'm currently trying to obtain m using a linear regression:

 m,c,r,x,y = stats.linregress(list(self.meter_readings.index),list(self.meter_readings['PointProduction']))

However I am getting an error:

 raise TypeError(other).

This is seemingly due to the formation of the x-axis as timestamps as oppose to just numbers.

How can I correct this?

Korem
  • 11,383
  • 7
  • 55
  • 72

2 Answers2

0

Convert the datetimestamps in the x-axis as epoch time in seconds.

If the indexes are in datetime objects you need to convert them to epoch time, for example if ts is a datetime object the following function does the conversion

ts_epoch = int(ts.strftime('%s'))

This is an example of piece of code that could it be good for you, for converting the index column into epoch seconds:

import pandas as pd
from datetime import datetime
import numpy as np

rng = pd.date_range('1/1/2011', periods=5, freq='H')
ts = pd.Series(np.random.randn(len(rng)), index=rng)

t =  ts.index
print [int(t[x].strftime('%s')) for x in range(len(t)) ]

This code is fully working on python2.7.

For using this piece of code on your problem, the solution could be the following:

t =  self.meter_readings.index
indexes = [int(t[x].strftime('%s')) for x in range(len(t)) ]

m,c,r,x,y = stats.linregress(indexes,list(self.meter_readings['PointProduction']))
TommasoF
  • 795
  • 5
  • 21
  • Thanks TommasoF, my index is not strings however, but rather timestamps already, of the form Period('2013-01', 'M'). – vbastrangledpython Jul 07 '14 at 14:12
  • Hi TommasoF, for some weird reason the print statement makes my interpreter crash. I've tried to work out why this is by running your code but that doesn't seem to run, it gives the error: ValueError: Invalid format string. This may be a mistake on my end but I think I'm running the lines as above. Does this work for you? Many thanks. – vbastrangledpython Jul 07 '14 at 16:34
  • The code is fully working on my side. I've added an import and specified the python version. – TommasoF Jul 08 '14 at 07:02
  • Sorry I tried to but my reputation is too low. I'll vote it up once my reputation is higher. – vbastrangledpython Jul 08 '14 at 17:18
0

You could try converting each Timestamp to Gregorian ordinal: linregress should then work with your freq='M' index.

import pandas as pd
from scipy import stats

data = [
7707.443,
 9595.481,
 5923.493,
 4813.446,
 5384.159,
 4108.496,
 6370.271,
 8829.357,
 7495.700,
13709.940,
 6148.129,
 7249.951,
 6546.819,
 7290.730,
 5056.485
 ]

period_index = pd.period_range(start='2012-03', periods=len(data), freq='M')

df = pd.DataFrame(data=data, 
              index=period_index,
              columns=['PointProduction'])

# these ordinals are months since the start of the Unix epoch                   
df['ords'] = [tstamp.ordinal for tstamp in df.index]
m,c,r,x,y = stats.linregress(list(df.ords),
                             list(df['PointProduction']))
  • Thanks lb n-plus-1. This looks good and I thought it would work, but there still seems to be something different about how my data is indexed to your example above, as I'm getting an attribute error: AttributeError: 'Period' object has no attribute 'toordinal'. – vbastrangledpython Jul 07 '14 at 15:53
  • @vbastrangledpython I've updated my answer to be compatible with a `PeriodIndex`d timeseries. – Laurence Billingham Jul 08 '14 at 06:54