2

I have a dataframe that looks like this:

FakeDist -5 -4 -3 -2 -1 0 1 2 3 4 5
1 37 14 17 29 31 34 32 31 21 17 18
2 12 13 12 16 30 33 37 32 32 15 42
3 40 16 29 31 36 32 30 19 16 15 12
4 12 14 12 28 28 30 29 27 16 18 33
5 12 13 16 17 28 32 33 30 29 17 35

I want to add a column that will be the Column_Name of the Maximum Value per Row.
I did that with:

df['MaxVal_Dist'] = df.idxmax(axis=1)  

Which gives me this df:

FakeDist -5 -4 ... MaxVal_Dist
1 37 14 ... -5
2 12 13 ... 5
3 40 16 ... -5
4 12 14 ... 5
5 12 13 ... 5

But my real end point would be to add an if condition. I want the Max Value for the column where 'FakeDist' is between -2 and 2. To have the following result:

FakeDist -5 -4 ... MaxVal_Dist
1 37 14 ... 0
2 12 13 ... 1
3 40 16 ... -1
4 12 14 ... 0
5 12 13 ... 1

I did try to look at how to add a df.apply but couldn't find how to make it work.
I have a "work around" idea that would be to store a subset of column (from -2 to 2) in a new dataframe, create my new column to get the max there, and then add that result column to my initial dataframe but it seem to me to be a very not elegant solution and I am sure there is much better to do.

I would be really glad to learn the elegant way to do that from you !

Ch3steR
  • 20,090
  • 4
  • 28
  • 58
Orohena
  • 33
  • 6

2 Answers2

2

You can use boolean indexing with loc to filter the columns in the range -2 to 2, then use idxmax along axis=1:

c = df.columns.astype(int)
df['MaxVal_Dist'] = df.loc[:, (c >= -2) & (c <= 2)].idxmax(1)

Result:

FakeDist  -5  -4  -3  -2  -1   0   1   2   3   4   5 MaxVal_Dist
1         37  14  17  29  31  34  32  31  21  17  18           0
2         12  13  12  16  30  33  37  32  32  15  42           1
3         40  16  29  31  36  32  30  19  16  15  12          -1
4         12  14  12  28  28  30  29  27  16  18  33           0
5         12  13  16  17  28  32  33  30  29  17  35           1
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53
  • This works perfectly for what I wanted to do ! Thanks Shubham. To all the people as lost as I was, it also works if your number have decimals. Also, make sure that your first row is the header ! Mine was not and so it initially didn't work. I followed this: [link](https://stackoverflow.com/questions/31328861/python-pandas-replacing-header-with-top-row) – Orohena Dec 17 '20 at 09:41
1

You can try List comprehension:

In [1159]: cols = [i for i in df.columns[1:] if -2 <= int(i) <= 2]

In [1161]: df['MaxVal_Dist'] = df[cols].idxmax(axis=1)

In [1162]: df
Out[1162]: 
   FakeDist  -5  -4  -3  -2  -1   0   1   2   3   4   5 MaxVal_Dist
0         1  37  14  17  29  31  34  32  31  21  17  18           0
1         2  12  13  12  16  30  33  37  32  32  15  42           1
2         3  40  16  29  31  36  32  30  19  16  15  12          -1
3         4  12  14  12  28  28  30  29  27  16  18  33           0
4         5  12  13  16  17  28  32  33  30  29  17  35           1
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
  • This also works perfectly for what I wanted to do ! Thanks Mayank. To all the people as lost as I was, it also works if your number have decimals. Also, make sure that your first row is the header ! Mine was not and so it initially didn't work. I followed this: [link](https://stackoverflow.com/questions/31328861/python-pandas-replacing-header-with-top-row) – Orohena Dec 17 '20 at 09:44