Return N Choose K Columns of An NxN array (Choosing N choose K features from a Correlation Matrix)

Question

I have a 40x40 Numpy array like so (here's a 4x4 example):

Correlationarray=np.array([[1, .1, .3, .4],
                          [.1, 1, .2, .7],                                                                   
                          [.3, .2, 1, .5], 
                          [.4, .7, .5, 1]])

I would like form all possible n choose k subsets of this array and compute the arrays with the lowest overall correlation values (so i would say, compute all the subsets and store them in another array, and then right a funciton that square each value in an arbitrary array in my array of arrays/list and sum along the row values)

For example, suppose these were two of the arrays in the 4 choose 2 set of arrays

a = np.array([[1, .1]  
              [.1, 1]
              [.3, .2]
              [.4, .7]])

       b=([[.1, .4]  
          [1, .7]
          [.2, .5]
          [.7, 1]])

I'd like to square each value in the array, sum the values row wise (and then sum the row scores) save that array and all other such arrays generated from the subsets of my correlation matrix, and return the n number of smallest arrays with the lowest values of the sum of squares

I'm not sure how to select all the subsets in a computationally quick fashion or to optimize performance, and not necessarily sure how to return the arrays with the lowest correlation scores (I'm familiar with returning elements of an array based on their value from largest to smallest, however, I'm not sure If i can generalize this)

Thanks!

I guess you could use this - https://stackoverflow.com/questions/30143417 alongwith itertools.combinations/numpy.triu_indices. — Divakar, Oct 05 '17 at 21:29

Return N Choose K Columns of An NxN array (Choosing N choose K features from a Correlation Matrix)

0 Answers0