How to create new columns in python for loop using variable column names?

Question

For some background, I've been trying to take the average of a dataset for each fiscal week for each unique name in the dataset as well. I start with the dataset that looks something like this:

pad	fiscal_week	value
Verace	5	23
Jersey	5	20
Verace	5	20
Verace	5	22
Verace	6	30
Colorado	4	15

What I currently have:

    unique_week = df['fiscal_week'].unique()
    unique_week = sorted(unique_week)
    newCols = pd.DataFrame()

    for week_number in unique_week:
        unique_id = df['pad'].unique()
        turbine_reg = df[df['fiscal_week'] == week_number]
        newColname = 'FW' + str(week_number)

        for turbine_name in unique_id:
            turbine_name_reg = turbine_reg[turbine_reg['pad'] == turbine_name]
            value_mean = [turbine_name_reg['value'].mean()]

            newCols['Turbine'] = turbine_name
            newCols[newColname] = direct_mean

What I would like, is for the end product to look something like:

pad	FW1	FW2	FW3
Verace	22	23	24
Jersey	15	16	20
Colorado	23	25	16

Currently, I'm only getting the result for the last unique pad the loop run on and not saving the times it runs for the other pads. I know the loop is overwriting itself each time, but I'm not sure how to fix it.

Any ideas?

score 0 · Answer 1 · answered Aug 04 '21 at 13:58

Sounds like you actually want to use df.groupby to group the dataframe by the pad and fiscal_week columns.

For example, starting with this dataframe df:

         pad  fiscal_week  value
0    Verace             5     23
1    Jersey             5     20
2    Verace             5     20
3    Verace             5     22
4    Verace             6     30
5  Colorado             4     15

We can compute the pad-wise, fiscal week-wise mean with:

df.groupby(['pad', 'fiscal_week']).mean()

output:

                           value
pad       fiscal_week           
Colorado  4            15.000000
Jersey    5            20.000000
Verace    5            21.666667
          6            30.000000

Tom S · Answer 2 · 2021-08-04T14:04:16.223

I think pivot table is what you are looking for:

        pad  fiscal_week  value
0    Verace            5     23
1    Jersey            5     20
2    Verace            5     20
3    Verace            5     22
4    Verace            6     30
5  Colorado            4     15

table = pd.pivot_table(df, values='value', index=['pad'], columns=['fiscal_week'], aggfunc=np.mean)

fiscal_week     4          5     6
pad
Colorado     15.0        NaN   NaN
Jersey        NaN  20.000000   NaN
Verace        NaN  21.666667  30.0

score 0 · Answer 3 · answered Aug 04 '21 at 14:09

I think you can use groupby() function for get every pad's mean in fiscal weeks.

data.groupby(by = ["pad", "fiscal_week"])["value"].aggregate('mean')

This code spinnet will returns a table like this:

pad	fiscal_week	meanValue
Colorade	4	15
Jersey	5	20
Verace	5	21
Verace	6	30

Then you need to add unstack() function to get fiscal_weeks as columns. You can check this answer to get more detailed information about unstack

data.groupby(by = ["pad", "fiscal_week"])["value"].aggregate('mean').unstack()

Finally you will get a table like this:

fiscal_week	4	5	6
Colorade	x	x	x
Jersey	x	x	x
Verace	x	x	x

I think this is the format which you want. After these processes you can use functions to edit your table. (E.g rename, reset_index, drop)

This is exactly what I'm looking for and way more simpler than what I was trying to do! Thank you so much! — Andrew Barndt, Aug 04 '21 at 14:21

How to create new columns in python for loop using variable column names?

3 Answers3