Generate additional row values so each ID has the same amount of entries

Question

I have a dataframe in Python Pandas which I've shifted to obtain a 3 week view of a specific column:

id	state	week_number	year
A	1	52	2022
A	1	51	2022
A	0	50	2022

df["current_week"] = df.state
df["week_2"] = df.state.shift(-1) 
df["week_3"] = df.state.shift(-2)

To end up with:

id	state	week_number	year	current_week	week_2
A	1	52	2022	1	1
A	1	51	2022	1	0
A	0	50	2022	0	0
A	0	47	2022	0	0
A	0	46	2022	0	0

This achieves my requirement - however, some IDs will not have the full 52 weeks. I'd like to generate new rows for each ID that has missing weeks and impute 0 for state, only adding missing weeks - so leaving the existing as they are. So as an example, let's say ID A has week 52, 51 and 50 but is missing 49 and 48 - I'd like to achieve this:

id	state	week_number	year	current_week	week_2
A	1	52	2022	1	1
A	1	51	2022	1	0
A	0	50	2022	0	0
A	0	49	2022	0	0
A	0	48	2022	0	0
A	0	47	2022	0	0
A	0	46	2022	0	0

It seems extremely simple, I'm sure it's my lack of ability to concisely explain the problem which has prevented me from finding an answer on Google. Any help is greatly appreciated.

The input is a dataframe as described above (second) table. Each ID should have week 1-52, so 52 rows per ID. Some ID's have missing weeks - so I'd like to pad the dataframe by adding those missing weeks as additional rows, with state set to 0. — Pheonix, Feb 24 '23 at 18:05

score 1 · Accepted Answer · answered Feb 25 '23 at 01:03

I think this should solve your problem...

This code takes your first table as input, uses .groupby() to group based on id and generates a dataframe with all possibles weeks for each group. Then fill missing values with 0 (basically to set state to 0 as requested in your comment)

After that concatenate all the new dataframes and create the new columns shifting the state column.

groups_dfs = []
for name, group in df.groupby('id'):
    df_tmp = pd.DataFrame({'week_number': list(range(1, 53))})
    df_tmp['id'] = name
    df_tmp = pd.merge(df_tmp, group, on=['id', 'week_number'], how='left')
    df_tmp = df_tmp.fillna(0)
    groups_dfs.append(df_tmp)

df_fixed = pd.concat(groups_dfs)
df_fixed = df_fixed.sort_values(['id', 'week_number'])

df_fixed["current_week"] = df_fixed.state
df_fixed["week_2"] = df_fixed.state.shift(-1) 
df_fixed["week_3"] = df_fixed.state.shift(-2)

Timeless · Answer 2 · 2023-02-25T11:20:46.530

If you need a pure Pandorable approach, you can check @mozway's answer here.

Another approach would be to use complete from janitor :

The input is a dataframe as described above (second) table.

#pip install pyjanitor
import janitor

WEEK_RANGE = list(range(1, 53))

out1 = (
          df
            .complete("id", "year", {"week_number": WEEK_RANGE}, fill_value=0)
            .sort_values(by=["id", "year", "week_number"],
                         ascending=[True, False, False],
                         ignore_index=True)
       )

Or, if you need a more dynamic approach (starting from your first table/dataframe), use this :

out2 = (
          df
            .complete("id", "year", {"week_number": WEEK_RANGE}, fill_value=0)
            .sort_values(by=["id", "year", "week_number"],
                         ascending=[True, False, False],
                         ignore_index=True)
            .pipe(lambda df_: df_.assign(**{f"week_{w}": df_["state"].shift(-w+1, fill_value=0)
                                            for w in WEEK_RANGE}))
            .rename({"week_1": "current_week"}, axis=1)
       )

Output :

#First approach
print(out1)

   id  state  week_number  year  current_week  week_2  week_3
0   A      1           52  2022             1       1       0
1   A      1           51  2022             1       0       0
2   A      0           50  2022             0       0       0
3   A      0           49  2022             0       0       0
4   A      0           48  2022             0       0       0
.. ..    ...          ...   ...           ...     ...     ...
47  A      0            5  2022             0       0       0
48  A      0            4  2022             0       0       0
49  A      0            3  2022             0       0       0
50  A      0            2  2022             0       0       0
51  A      0            1  2022             0       0       0

[52 rows x 7 columns]

#Second appraoch
print(out2)

   id  state  week_number  year  current_week  ...  week_48  week_49  week_50  week_51  week_52
0   A      1           52  2022             1  ...        0        0        0        0        0
1   A      1           51  2022             1  ...        0        0        0        0        0
2   A      0           50  2022             0  ...        0        0        0        0        0
3   A      0           49  2022             0  ...        0        0        0        0        0
4   A      0           48  2022             0  ...        0        0        0        0        0
.. ..    ...          ...   ...           ...  ...      ...      ...      ...      ...      ...
47  A      0            5  2022             0  ...        0        0        0        0        0
48  A      0            4  2022             0  ...        0        0        0        0        0
49  A      0            3  2022             0  ...        0        0        0        0        0
50  A      0            2  2022             0  ...        0        0        0        0        0
51  A      0            1  2022             0  ...        0        0        0        0        0

[52 rows x 56 columns]

Generate additional row values so each ID has the same amount of entries

2 Answers2