I have a pandas DataFrame in this format:
TRUTH A B C CLASS
2020-01-01 00:00:00+00:00 1 1 2 1 A
2020-01-02 00:00:00+00:00 2 1 2 2 B
2020-01-03 00:00:00+00:00 3 2 2 3 C
2020-01-04 00:00:00+00:00 4 4 3 3 A
2020-01-05 00:00:00+00:00 3 8 3 3 C
...
The columns A
, B
and C
represent predictions and TRUTH
is the actual value.
The column CLASS
tells which prediction is the preferred prediction.
I want to generate the final prediction getting each preferred prediction. Meaning I want the value from column A (1) then the value from B (2) then the value from C (3), then the value from A (4), then the value from C (3).
The result would be this:
TRUTH PREDICTION A B C CLASS
2020-01-01 00:00:00+00:00 1 1 1 2 1 A
2020-01-02 00:00:00+00:00 2 2 1 2 2 B
2020-01-03 00:00:00+00:00 3 3 2 2 3 C
2020-01-04 00:00:00+00:00 4 4 4 3 3 A
2020-01-05 00:00:00+00:00 3 3 8 3 3 C
...
I have a sample code, which can do this, but it's a little slow..
df["PREDICTION"] = [df.loc[i, col] for i, col in zip(df.index, df["CLASS"])]
There most definitely is a better way of doing this kind of manipulation but I have no idea..