2

I have a dataframe in Python Pandas which I've shifted to obtain a 3 week view of a specific column:

id state week_number year
A 1 52 2022
A 1 51 2022
A 0 50 2022
df["current_week"] = df.state
df["week_2"] = df.state.shift(-1) 
df["week_3"] = df.state.shift(-2)

To end up with:

id state week_number year current_week week_2 week_3
A 1 52 2022 1 1 0
A 1 51 2022 1 0 0
A 0 50 2022 0 0 0
A 0 47 2022 0 0 0
A 0 46 2022 0 0 0

This achieves my requirement - however, some IDs will not have the full 52 weeks. I'd like to generate new rows for each ID that has missing weeks and impute 0 for state, only adding missing weeks - so leaving the existing as they are. So as an example, let's say ID A has week 52, 51 and 50 but is missing 49 and 48 - I'd like to achieve this:

id state week_number year current_week week_2 week_3
A 1 52 2022 1 1 0
A 1 51 2022 1 0 0
A 0 50 2022 0 0 0
A 0 49 2022 0 0 0
A 0 48 2022 0 0 0
A 0 47 2022 0 0 0
A 0 46 2022 0 0 0

It seems extremely simple, I'm sure it's my lack of ability to concisely explain the problem which has prevented me from finding an answer on Google. Any help is greatly appreciated.

Pheonix
  • 93
  • 5
  • Your question is unclear, what exactly is the input? – mozway Feb 24 '23 at 18:03
  • The input is a dataframe as described above (second) table. Each ID should have week 1-52, so 52 rows per ID. Some ID's have missing weeks - so I'd like to pad the dataframe by adding those missing weeks as additional rows, with state set to 0. – Pheonix Feb 24 '23 at 18:05

2 Answers2

1

I think this should solve your problem...

This code takes your first table as input, uses .groupby() to group based on id and generates a dataframe with all possibles weeks for each group. Then fill missing values with 0 (basically to set state to 0 as requested in your comment)

After that concatenate all the new dataframes and create the new columns shifting the state column.

groups_dfs = []
for name, group in df.groupby('id'):
    df_tmp = pd.DataFrame({'week_number': list(range(1, 53))})
    df_tmp['id'] = name
    df_tmp = pd.merge(df_tmp, group, on=['id', 'week_number'], how='left')
    df_tmp = df_tmp.fillna(0)
    groups_dfs.append(df_tmp)

df_fixed = pd.concat(groups_dfs)
df_fixed = df_fixed.sort_values(['id', 'week_number'])

df_fixed["current_week"] = df_fixed.state
df_fixed["week_2"] = df_fixed.state.shift(-1) 
df_fixed["week_3"] = df_fixed.state.shift(-2)
Pedro Rocha
  • 1,373
  • 1
  • 3
  • 14
1

If you need a pure Pandorable approach, you can check @mozway's answer here.

Another approach would be to use complete from :

The input is a dataframe as described above (second) table.

#pip install pyjanitor
import janitor

WEEK_RANGE = list(range(1, 53))

out1 = (
          df
            .complete("id", "year", {"week_number": WEEK_RANGE}, fill_value=0)
            .sort_values(by=["id", "year", "week_number"],
                         ascending=[True, False, False],
                         ignore_index=True)
       )

Or, if you need a more dynamic approach (starting from your first table/dataframe), use this :

out2 = (
          df
            .complete("id", "year", {"week_number": WEEK_RANGE}, fill_value=0)
            .sort_values(by=["id", "year", "week_number"],
                         ascending=[True, False, False],
                         ignore_index=True)
            .pipe(lambda df_: df_.assign(**{f"week_{w}": df_["state"].shift(-w+1, fill_value=0)
                                            for w in WEEK_RANGE}))
            .rename({"week_1": "current_week"}, axis=1)
       )

Output :

#First approach
print(out1)

   id  state  week_number  year  current_week  week_2  week_3
0   A      1           52  2022             1       1       0
1   A      1           51  2022             1       0       0
2   A      0           50  2022             0       0       0
3   A      0           49  2022             0       0       0
4   A      0           48  2022             0       0       0
.. ..    ...          ...   ...           ...     ...     ...
47  A      0            5  2022             0       0       0
48  A      0            4  2022             0       0       0
49  A      0            3  2022             0       0       0
50  A      0            2  2022             0       0       0
51  A      0            1  2022             0       0       0

[52 rows x 7 columns]

#Second appraoch
print(out2)

   id  state  week_number  year  current_week  ...  week_48  week_49  week_50  week_51  week_52
0   A      1           52  2022             1  ...        0        0        0        0        0
1   A      1           51  2022             1  ...        0        0        0        0        0
2   A      0           50  2022             0  ...        0        0        0        0        0
3   A      0           49  2022             0  ...        0        0        0        0        0
4   A      0           48  2022             0  ...        0        0        0        0        0
.. ..    ...          ...   ...           ...  ...      ...      ...      ...      ...      ...
47  A      0            5  2022             0  ...        0        0        0        0        0
48  A      0            4  2022             0  ...        0        0        0        0        0
49  A      0            3  2022             0  ...        0        0        0        0        0
50  A      0            2  2022             0  ...        0        0        0        0        0
51  A      0            1  2022             0  ...        0        0        0        0        0

[52 rows x 56 columns]
Timeless
  • 22,580
  • 4
  • 12
  • 30