Create Categorical Variable based on Maximum of Three Columns

Question

I have a dataframe with sentiment probabilities for certain news articles that looks like this:

sentimentPositive sentimentNegative sentimentNeutral 0.219640 0.010708 0.769652 0.539188 0.088198 0.372615 0.561837 0.264411 0.173752 0.570648 0.255499 0.173853 0.525263 0.097155 0.377582

I now want to create a new categorical column that tells me, which sentiment in the row has the highest probability and encode it with e.g. (0, 1, 2) for the dominant sentiment.

The final output should look like:

sentimentPositive sentimentNegative sentimentNeutral Sentiment 0.219640 0.010708 0.769652 2 0.539188 0.088198 0.372615 0 0.561837 0.264411 0.173752 0 0.570648 0.255499 0.173853 0 0.097155 0.525263 0.377582 1

I know that I can get the max values of the columns by:

df["max"] = df[["sentimentPositive","sentimentNegative","sentimentNeutral"]].max(axis=1)

And could then compare the values in the max column to the other values to check the category. But there should be a more pandanic way to do it, right?

jezrael · Accepted Answer · 2019-03-21T14:17:54.753

Use numpy.argmax for positions:

cols = ["sentimentPositive","sentimentNegative","sentimentNeutral"]
df["max"] = df[cols].values.argmax(axis=1)
#for columns names
#df["max"] = df[cols].idxmax(axis=1)
print (df)
   sentimentPositive  sentimentNegative  sentimentNeutral  max
0           0.219640           0.010708          0.769652    2
1           0.539188           0.088198          0.372615    0
2           0.561837           0.264411          0.173752    0
3           0.570648           0.255499          0.173853    0
4           0.097155           0.525263          0.377582    1

Create Categorical Variable based on Maximum of Three Columns

1 Answers1