0

I have 2 dataframes, res_params and res_tStats. The columns and index is identical in both dataframes. It's only the values within each row and column (equivalent to "cells" in Excel if you will) that are different.

I need to get the index of the rows and columns from the res_tStats dataframe where abs(res_tStats) > 1.96 and then apply formatting to the same index and rows in the res_params data frame.

What I've done

# res_tStats is a NxM dataframe with shape (6, 4)
# res_params is a NXM dataframe with shape (6, 4)

idx_required = np.where(abs(res_tStats) > 1.96)

print(idx_required)
# output: array([0, 0, 0, 5, 5, 5, 5]), array([0, 2, 3, 0, 1, 2, 3]))

So far so good. It gave me the right index locations for where abs(res_tStats) > 1.96. The following values are in fact significant:

row 0, col 0
row 0, col 2
row 0, col 3
row 5, col 0
row 5, col 1
row 5, col 2
row 5, col 3

Now, when I apply these index locations to res_params, I'm expecting to get the values at precisely the same index locations. However, I don't quite get that.

Running: res_params.iloc[idx_required] returns a df of shape (7, 7).

I double checked by applying idx_required to res_tStats, expecting to get only the values > 1.96, however that didn't happen.
res_tStats.iloc[idx_required] also returns a df of shape (7, 7) with some values within it being lower than 1.96

halfer
  • 19,824
  • 17
  • 99
  • 186
Vash
  • 190
  • 8
  • Please provide samples of your input dataframes, sample of your output, and sample of your expected output, to make a [mcve] we can test against. See [How to make good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – G. Anderson Mar 25 '20 at 18:02

2 Answers2

0

The problem is that idx_required is a tuple with indeces and columns, and .iloc selects those in a cross-product way if you pass a tuple.

A simple fix is:

res_params[np.abs(res_tStats)>1.96]

This will return a dataframe with values of res_params where np.abs(res_tStats)>1.96 and NaN where the condition does not hold true.

Another way, is to get the list of elements as:

res_params.to_numpy()[idx_required]

This way you will get the list of elements, without the respective index or column.

FBruzzesi
  • 6,385
  • 3
  • 15
  • 37
0

I will assume that you want a single 1D array or a Series with the 5 expected elements along with their indices.

A possible way would be the use the indices of the columns and stack the dataframe:

res_param.set_axis(range(len(res_param.columns)), axis=1, inplace=False).stack()[zip(*idx_required)]

It should give you a Series with the indices as a MultiIndex and the relevant values.

Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • Cheers Serge, however I need the values within the dataframe itself since I'll be applying formatting to the values which meet the condition on the other df. The solution below works well. Many thanks for your response though! – Vash Mar 25 '20 at 18:16