I have a dataframe with multiple duplicate columns but I would like to drop the duplicate of the "class" column while keeping other duplicate columns intact. Below you can see there are many duplicate columns. However, I am only interested in dropping the "class" column and keep one copy of it only. The other columns should stay intact and row number should not change.
Dataframe:
train = pd.DataFrame({'class': {0: 1,
1: 2,
2: 3,
3: 4,
4: 5,
5: 6,
6: 7,
7: 8,
8: 1,
9: 2,
10: 3,
11: 4,
12: 5,
13: 6,
14: 7,
15: 8},
'class.1': {0: 1,
1: 2,
2: 3,
3: 4,
4: 5,
5: 6,
6: 7,
7: 8,
8: 1,
9: 2,
10: 3,
11: 4,
12: 5,
13: 6,
14: 7,
15: 8},
'class.2': {0: 1,
1: 2,
2: 3,
3: 4,
4: 5,
5: 6,
6: 7,
7: 8,
8: 1,
9: 2,
10: 3,
11: 4,
12: 5,
13: 6,
14: 7,
15: 8},
'x_feature_1': {0: -0.30424321,
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'x_feature_1.1': {0: -0.30424321,
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'x_feature_2': {0: -0.30424321,
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'y_feature_1': {0: -0.30424321,
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'y_feature_2': {0: -0.30424321,
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'y_feature_2.1': {0: -0.30424321,
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'z_feature_1': {0: -0.30424321,
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'z_feature_1.1': {0: -0.30424321,
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'z_feature_2': {0: -0.30424321,
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296}})
expected:
expected = pd.DataFrame({'class': {0: 1,
1: 2,
2: 3,
3: 4,
4: 5,
5: 6,
6: 7,
7: 8,
8: 1,
9: 2,
10: 3,
11: 4,
12: 5,
13: 6,
14: 7,
15: 8},
'x_feature_1': {0: -0.30424321,
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'x_feature_1.1': {0: -0.30424321,
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'x_feature_2': {0: -0.30424321,
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'y_feature_1': {0: -0.30424321,
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'y_feature_2': {0: -0.30424321,
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'y_feature_2.1': {0: -0.30424321,
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'z_feature_1': {0: -0.30424321,
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'z_feature_1.1': {0: -0.30424321,
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296},
'z_feature_2': {0: -0.30424321,
1: 1.6273111,
2: 0.66127653,
3: 0.0051847840000000004,
4: 1.2861978,
5: -0.47925246,
6: 1.4743277,
7: 0.30530296,
8: -0.30424321,
9: 1.6273111,
10: 0.66127653,
11: 0.0051847840000000004,
12: 1.2861978,
13: -0.47925246,
14: 1.4743277,
15: 0.30530296}})
[in]:
train = train.loc[:,~(train["class"].duplicated())]
[out]:
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
Edit: Added example dataframe and expected output dataframe.