I have a dataframe with sentiment probabilities for certain news articles that looks like this:
sentimentPositive sentimentNegative sentimentNeutral
0.219640 0.010708 0.769652
0.539188 0.088198 0.372615
0.561837 0.264411 0.173752
0.570648 0.255499 0.173853
0.525263 0.097155 0.377582
I now want to create a new categorical column that tells me, which sentiment in the row has the highest probability and encode it with e.g. (0
, 1
, 2
) for the dominant sentiment.
The final output should look like:
sentimentPositive sentimentNegative sentimentNeutral Sentiment
0.219640 0.010708 0.769652 2
0.539188 0.088198 0.372615 0
0.561837 0.264411 0.173752 0
0.570648 0.255499 0.173853 0
0.097155 0.525263 0.377582 1
I know that I can get the max values of the columns by:
df["max"] = df[["sentimentPositive","sentimentNegative","sentimentNeutral"]].max(axis=1)
And could then compare the values in the max
column to the other values to check the category. But there should be a more pandanic way to do it, right?