14

I am stuck in the following lines

import quandl,math
import pandas as pd
import numpy as np
from  sklearn import preprocessing ,cross_validation , svm
from sklearn.linear_model import  LinearRegression


df = quandl.get('WIKI/GOOGL')




df = df[['Adj. Open','Adj. High','Adj. Low','Adj. Close','Adj. Volume']]

df['HL_PCT'] = (df["Adj. High"] - df['Adj. Close'])/df['Adj. Close'] * 100
df['PCT_CHANGE'] = (df["Adj. Close"] - df['Adj. Open'])/df['Adj. Open'] * 100

df = df[['Adj. Close','HL_PCT','PCT_CHANGE','Adj. Open']]

forecast_col = 'Adj. Close'

df.fillna(-99999,inplace = True)

forecast_out = int(math.ceil(.1*len(df)))

df['label'] = df[forecast_col].shift(-forecast_out)
print df.head()

I couldn't understand what is meant by df[forecast_col].shift(-forecast_out)

Please explain the command and what is does??

lmo
  • 37,904
  • 9
  • 56
  • 69
rithwik kukunuri
  • 143
  • 1
  • 1
  • 6
  • When forecasting, you're "lagging" the column, so the negative shift is shifting the column forward by the value of forecast_out. – TTT Jun 21 '17 at 12:22
  • 2
    did you read the [docs](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.shift.html#pandas.Series.shift)? – EdChum Jun 21 '17 at 12:23
  • @TTT like overlaying the curve snapshot starting from the current timestamp to the old snapshot, to see the curve tendency? – June Wang Aug 09 '19 at 10:36

2 Answers2

22

Shift function of pandas.Dataframe shifts index by desired number of periods with an optional time freq. For further information on shift function please refer this link.

Here is the small example of column values being shifted:

import pandas as pd 
import numpy as np
df = pd.DataFrame({"date": ["2000-01-03", "2000-01-03", "2000-03-05", "2000-01-03", "2000-03-05",
                        "2000-03-05", "2000-07-03", "2000-01-03", "2000-07-03", "2000-07-03"],
               "variable": ["A", "A", "A", "B", "B", "B", "C", "C", "C", "D"],
               "no": [1, 2.2, 3.5, 1.5, 1.5, 1.2, 1.3, 1.1, 2, 3],
               "value": [0.469112, -0.282863, -1.509059, -1.135632, 1.212112, -0.173215,
                         0.119209, -1.044236, -0.861849, None]})

Below is the column value before it is shifted

df['value']

output

0    0.469112
1   -0.282863
2   -1.509059
3   -1.135632
4    1.212112
5   -0.173215
6    0.119209
7   -1.044236
8   -0.861849
9         NaN

Using shift function values are shifted depending on period given

for example using shift with positive integer shifts rows value downwards:

df['value'].shift(1)

output

0         NaN
1    0.469112
2   -0.282863
3   -1.509059
4   -1.135632
5    1.212112
6   -0.173215
7    0.119209
8   -1.044236
9   -0.861849
Name: value, dtype: float64

using shift with negative integer shifts rows value upwards:

df['value'].shift(-1)

output

0   -0.282863
1   -1.509059
2   -1.135632
3    1.212112
4   -0.173215
5    0.119209
6   -1.044236
7   -0.861849
8         NaN
9         NaN
Name: value, dtype: float64
Akshay Kandul
  • 592
  • 4
  • 10
0

code here wants to put values from the future, make a prediction for 'Adj. Close' Value by putting next 10% of data frame-length's value in df['label'] for each row.

forecast_out = int(math.ceil(.1*len(df)))
df['label'] = df[forecast_col].shift(-forecast_out)

if you print df.tail() you will get NaN value.