0

I have a dataset like below, except the number of stores is over 30 and the number of rows is in the hundreds. The values represent sales for each store per week. I am attempting to find the highest value in a specific ROW, and what that corresponding store is (the column label):

print(df)
         week  storeA  storeB  storeC  storeD  storeE
0  2014-05-04    2643    8257    3893    6231    1294
1  2014-05-11    6444    5736    5634    7092    2907
2  2014-05-18    9646    2552    4253    5447    4736
3  2014-05-25    5960   10740    8264    6063     949
4  2014-06-01    7412    7374    3208    3985    3023

I can isolate the week easily enough, and identify the highest value, but I cannot figure out how to identify which store corresponds to that high value in the row. So let's say I'm trying to find the store with the highest sales for the week of 2014-05-25:

Using .max() doesn't work:

topsales = df[df['week'] == '2014-05-25']
print(topsales)

         week  storeA  storeB  storeC  storeD  storeE
3  2014-05-25    5960   10740    8264    6063     949

# Drop the week column
topsales = topsales.iloc[:, 1:]


topsales.max()

storeA     5960
storeB    10740
storeC     8264
storeD     6063
storeE      949
dtype: int64

Using .max(axis=1) just gets me the row number, not the column label:

topsales[topsales['week'] == '2014-05-25'].max(axis=1)

3    10740
dtype: int64

I have also tried .idxmax() and argmax() and neither of those work either. I have looked and looked and I can't find a solution. There has GOT to be an easy way to find the highest value in a row and have it tell you the corresponding column label. I can't believe this isn't something that could be useful. Thanks in advance.

  • So, more specifically, `df[df.columns.difference(["week"])].idxmax(axis=1)` – Mateen Ulhaq Apr 21 '20 at 21:57
  • One thing you certainly don't want to use is `max`. That just gives you the maximum *value*, not maximum *key* ("index"). – Mateen Ulhaq Apr 21 '20 at 21:58
  • That does work to show the column label, but is there a way to show both the value and the column label in one line of code? – GeekyPrimate Apr 21 '20 at 22:16
  • `df[df.columns.difference(["week"])].agg(["idxmax", "max"], axis=1)` – Mateen Ulhaq Apr 22 '20 at 00:23
  • You could probably also reuse the labels you have to lookup their values in order to avoid duplicate computation, but `idxmax` and `max` are both O(n) in the number of columns and quite fast anyways, so it doesn't really matter. – Mateen Ulhaq Apr 22 '20 at 00:25

0 Answers0