0

I have a dataframe with years of data and many features.

For each of those features I want to create a new feature that averages the last 12 weeks of data.

So say I have weekly data. I want a datapoint for feature1B to give me the average of the last 12 rows of data from feature1A. And if the data is hourly, I want the same done but for the last 2016 rows (24 hours * 7 days * 12 weeks)

So for instance, say the data looks like this:

Week    Feature1
1       8846
2       2497
3       1987
4       5294
5       2487
6       1981
7       8973
8       9873
9       8345
10      5481
11      4381
12      8463
13      7318
14      8642
15      4181
16      3871
17      7919
18      2468
19      4981
20      9871

I need the code to loop through the multiple feature, create a feature name such as 'TARGET.'+feature and spit the averaged data based on my criteria (last 12 rows... last 2016 rows... depends on the format).

Week    Feature1    Feature1-B
1       8846    
2       2497    
3       1987    
4       5294    
5       2487    
6       1981    
7       8973    
8       9873    
9       8345    
10      5481    
11      4381    
12      8463    
13      7318        5717.333333
14      8642        5590
15      4181        6102.083333
16      3871        6284.916667
17      7919        6166.333333
18      2468        6619
19      4981        6659.583333
20      9871        6326.916667

Appreciate any help.

Tom
  • 85
  • 1
  • 7
  • 3
    Looks like you need [`rolling`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.rolling.html), ie for weekly data: `df['Feature1-B'] = df['Feature1'].rolling(12).mean()` – Chris Adams May 14 '20 at 09:27
  • 1
    superb ... something new I learned today. Thanks ! – Anshul May 14 '20 at 09:30
  • 1
    That's exactly what I was looking for Chris A. I got it all sorted now. – Tom May 14 '20 at 10:08

2 Answers2

1

Solved with the helpful comment from Chris A. Can't seem to mark that comment as an answer.

import pandas as pd

df = pd.read_csv('data.csv')
cols = df.iloc[:,2:].columns

for c in cols:
    df['12W_AVG.'+c] = df[c].rolling(2016).mean()
    df['12W_AVG.'+c] = df['12W_AVG.'+c].fillna(df['12W_AVG.'+c][2015])
    df['12W_AVG.'+c+'_LAL'] = df['12W_AVG.'+c]*0.9
    df['12W_AVG.'+c+'_UAL'] = df['12W_AVG.'+c]*1.1
    df.drop(c, axis=1, inplace=True)  
Tom
  • 85
  • 1
  • 7
0

Does this work for you?

import pandas as pd
import numpy as np

df = pd.DataFrame(columns=["week", "data"], data=[
  [1, 8846],
  [2,2497],
  [3,1987],
  [4,5294],
  [5,2487],
  [6,1981],
  [7,8973],
  [8,9873],
  [9,8345],
  [10,5481],
  [11,4381],
  [12,8463],
  [13,7318],
  [14,8642],
  [15,4181],
  [16,3871],
  [17,7919],
  [18,2468],
  [19,4981],
  [20,9871]])

df.insert(2, "average",0, True)

for length in range(12, len(df.index)):
  values = df.iloc[length-12:index, 1]
  weekly_sum = np.sum(values, axis=0)
  df.at[length, 'average'] = weekly_sum / 12

print(df)

mind you, this is very bad code and requires you to do some work on it yourself

Gertjan Brouwer
  • 996
  • 1
  • 12
  • 35