I need your help on a complicated (for me) groupby on pandas. I have the following dataframe.
id | target | feature_id | feature_value |
---|---|---|---|
6f281111 | 0 | right-acx_02 | 5.433747 |
6f281111 | 0 | right-min | 0.097000 |
a1111628 | 1 | left_ss_01 | -0.650741 |
a1111628 | 1 | right-wcs | 3.359375 |
6f282222 | 0 | left_l_tf | 0.000000 |
..... | ..... | ..... | ...... |
For each "id" I have around 400 rows representing a different feature. I want to reformat the dataframe to have something like that :
index | target | right-acx_02 | right-min | left_ss_01 | right-wcs | left_l_tf | ... |
---|---|---|---|---|---|---|---|
6f281111 | 0 | 5.433747 | 0.097000 | NA | NA | NA | ... |
a1111628 | 1 | NA | NA | -0.650741 | 3.359375 | NA | ... |
6f282222 | 0 | NA | NA | NA | NA | 0.000000 | ... |
... | ... | .... | .... | .... | .... | ... | ... |
This is what I did by now it works but I know it's far from being optimal.
unique_uuids = list(raw_df["evaluation_uuid"].unique())
values = np.insert(raw_df["feature_id"].unique(), 0, ["user_id"])
dict_feature = dict.fromkeys(np.insert(raw_df["feature_id"].unique(), 0, ["user_id"]))
pre_df = {uuid:dict_feature for uuid in unique_uuids}
df_transformed = pd.DataFrame.from_dict(pre_df, orient="index")
for index, row in tqdm(self.raw_df.iterrows()):
df_transformed.loc[row['evaluation_uuid'], row["feature_id"]] = row["feature_value"]
df_transformed.loc[row['evaluation_uuid'], "user_id"] = row["user_id"]
df_transformed = df_transformed.fillna(0)
I imagine there is a vectorized solution to this problem, if you need any detail let me know.
Thank you in advance.