I have a dataframe df:
df = pd.DataFrame(
{
"type": ["E/2", "E/2", "E/2", "E/2"],
"subtype1": ["N2", "N2", "N2", "N2"],
"subtype2": ["a", "a", "b", np.nan],
"subtype3": [np.nan, np.nan, np.nan, "xxx"],
"flex_best": [20, np.nan, 20, np.nan],
"flex_worst": [np.nan, 30, np.nan, 30],
"lead_best": [23, np.nan, 23, np.nan],
"is_best": [1, np.nan, 1, np.nan],
"lead_worst": [np.nan, 33, np.nan, 33],
"is_worst": [np.nan, 1, np.nan, 1],
}
)
df.head()
| type | subtype1 | subtype2 | subtype3 | flex_best | flex_worst | lead_best | is_best | lead_worst | is_worst|
|------|----------|----------|----------|-----------|------------|-----------|---------|------------|---------|
| E/2 | N2 | a | NaN | 20 | NaN | 23 | 1 | NaN | NaN |
| E/2 | N2 | a | NaN | NaN | 30 | NaN | NaN | 33 | 1 |
| E/2 | N2 | b | NaN | 20 | NaN | 23 | 1 | NaN | NaN |
| E/3 | N2 | NaN | xxx | NaN | 30 | NaN | NaN | 33 | 1 |
I want to remove duplicate rows and merge them together by: ["type", "subtype1", "subtype", "subtype3"] , filling the NaNs where suitable.
So:
- "flex_best" + "flex_worst"
- "lead_best" + "lead_worst"
- "is_best" + "is_worst"
This should result in this dataframe:
| type | subtype1 | subtype2 | subtype3 | flex_best | flex_worst | lead_best | is_best | lead_worst | is_worst|
|------|----------|----------|----------|-----------|------------|-----------|---------|------------|---------|
| E/2 | N2 | a | NaN | 20 | 30 | 23 | 1 | 33 | 1 |
| E/2 | N2 | b | NaN | 20 | NaN | 23 | 1 | NaN | NaN |
| E/2 | N2 | NaN | xxx | NaN | 30 | NaN | NaN | 33 | 1 |
How can I do this with pandas?