0

For some background, I've been trying to take the average of a dataset for each fiscal week for each unique name in the dataset as well. I start with the dataset that looks something like this:

pad fiscal_week value
Verace 5 23
Jersey 5 20
Verace 5 20
Verace 5 22
Verace 6 30
Colorado 4 15

What I currently have:

    unique_week = df['fiscal_week'].unique()
    unique_week = sorted(unique_week)
    newCols = pd.DataFrame()

    for week_number in unique_week:
        unique_id = df['pad'].unique()
        turbine_reg = df[df['fiscal_week'] == week_number]
        newColname = 'FW' + str(week_number)

        for turbine_name in unique_id:
            turbine_name_reg = turbine_reg[turbine_reg['pad'] == turbine_name]
            value_mean = [turbine_name_reg['value'].mean()]

            newCols['Turbine'] = turbine_name
            newCols[newColname] = direct_mean

What I would like, is for the end product to look something like:

pad FW1 FW2 FW3
Verace 22 23 24
Jersey 15 16 20
Colorado 23 25 16

Currently, I'm only getting the result for the last unique pad the loop run on and not saving the times it runs for the other pads. I know the loop is overwriting itself each time, but I'm not sure how to fix it.

Any ideas?

3 Answers3

0

Sounds like you actually want to use df.groupby to group the dataframe by the pad and fiscal_week columns.

For example, starting with this dataframe df:

         pad  fiscal_week  value
0    Verace             5     23
1    Jersey             5     20
2    Verace             5     20
3    Verace             5     22
4    Verace             6     30
5  Colorado             4     15

We can compute the pad-wise, fiscal week-wise mean with:

df.groupby(['pad', 'fiscal_week']).mean()

output:

                           value
pad       fiscal_week           
Colorado  4            15.000000
Jersey    5            20.000000
Verace    5            21.666667
          6            30.000000
jfaccioni
  • 7,099
  • 1
  • 9
  • 25
0

I think pivot table is what you are looking for:

        pad  fiscal_week  value
0    Verace            5     23
1    Jersey            5     20
2    Verace            5     20
3    Verace            5     22
4    Verace            6     30
5  Colorado            4     15
table = pd.pivot_table(df, values='value', index=['pad'], columns=['fiscal_week'], aggfunc=np.mean)
fiscal_week     4          5     6
pad
Colorado     15.0        NaN   NaN
Jersey        NaN  20.000000   NaN
Verace        NaN  21.666667  30.0
Tom S
  • 591
  • 1
  • 5
  • 21
0

I think you can use groupby() function for get every pad's mean in fiscal weeks.

data.groupby(by = ["pad", "fiscal_week"])["value"].aggregate('mean')

This code spinnet will returns a table like this:

pad fiscal_week meanValue
Colorade 4 15
Jersey 5 20
Verace 5 21
Verace 6 30

Then you need to add unstack() function to get fiscal_weeks as columns. You can check this answer to get more detailed information about unstack

data.groupby(by = ["pad", "fiscal_week"])["value"].aggregate('mean').unstack()

Finally you will get a table like this:

fiscal_week 4 5 6
Colorade x x x
Jersey x x x
Verace x x x

I think this is the format which you want. After these processes you can use functions to edit your table. (E.g rename, reset_index, drop)

UncleBob
  • 41
  • 10