0

I have a 3-dimensional matrix

import numpy as np
matrix = np.random.randn(100,10,500)
  • X - dimension : samples of data
  • Y - dimension : parameter/variable type
  • Z - dimension : location

In the above matrix, there are 100 samples for 10 variables at 500 locations. For example, if I index variable #1, then I have 100 samples of that variable (spanning 20 seconds) at each of the 500 locations.

I need to identify which samples of data and which locations match certain criteria. Matches are per matrix (indexing a particular variable/parameter) and would be in pairs, describing the sample and location of where the criteria matched. For example, a match for the above 3D matrix would be at sample = 50 and location = 31. There could be multiple pairs of matches. I generally return this as an array of tuples, where each tuple contains the sample and location number.

The criteria could specify one or more:

  • Ranges of values : between -1.0 and 5.5 for example
  • Individual values : value must == 1.39 for example

These ranges and individual values can be specified for one or more:

  • Variables/parameters

For example:

  • Parameter #1 (Y-index = 0) : (-1.0 to 5.5) or (10.3 to 12.1) or 20.32
  • Parameter #5 (Y-index = 4) : 10.0 or (1.0 to 800.0)
  • Parameter #8 (Y-index = 7) : (50.0 to 100.0)

Additionally, I would need the ability to invert the criteria, for example:

  • Parameter #1 (Y-index = 0) : NOT ( (-1.0 to 5.5) or (10.3 to 12.1) or 20.32 )

I would need to have a list of tuples indicating the sample index (X-axis) and location index (Z-axis), where Parameter #1 and Parameter #5 and Parameter #8 match their conditions in the 3D matrix.

I've been looking at intersect1D. I've also been using np.where in a loop, which is very very inefficient, such as:

import numpy as np
matrix = np.random.randn(100,10,500)

net_array = None
for parameter in parameters: 

    total_result = None
    
    for lower_range_value, upper_range_value in range_values[parameter]:
        result = np.where( (matrix[:,parameter,:] >= lower_range_value) & (matrix[:,parameter,:] <= upper_range_value)
        
        if result[0].size > 0: 
            if type(total_result) == type(None):
                total_result = result
            else: 
                concat_0 = np.concatenate( (total_result[0], result[0]) )
                concat_1 = np.concatenate( (total_result[1], result[1]) )
                total_result = (concat_0,concat_1)
            
    for discrete_value in discrete_values[parameter]:
        result = np.where( matrix[:,parameter,:] == threshold )
        
        if result[0].size > 0:
            if type(total_result) == type(None):
                total_result = result
            else: 
                concat_0 = np.concatenate( (total_result[0], result[0]) )
                concat_1 = np.concatenate( (total_result[1], result[1]) )
                total_result = (concat_0,concat_1)

    if type(total_result) != type(None): 
        if type(net_array) == type(None): 
            net_array = np.stack( [ total_result[0] , total_result[1] ] , axis = -1) 
        else: 
            stacked_total_result = np.stack( [ total_result[0] , total_result[1] ] , axis=-1 )
            match_indexes = (net_array[:,None] == stacked_total_result).all(-1).any(1)
            net_array = net_array[match_indexes]
            if np.any(match_indexes) == False: 
                break 

Is there an efficient way of finding the sample index (X-axis) and location index (Z-axis) where one or more parameters (Y-axis) each match their criteria?

geekygeek
  • 611
  • 4
  • 15

1 Answers1

0

I think you want something like this?

import numpy as np

(n_samples, n_vars, n_locs) = (100, 10, 500)
matrix = np.random.randn(n_samples, n_vars, n_locs)

param_idx2ranges = {
    0: [(-2.0, -2.5), (0.5, 0.75), (2.32, 2.32)],
    4: [(1.0, 1.0), (2.0, 40.0)],
    7: [(1.5, 2.0)],
}
final_mask = np.ones((n_samples, n_locs), dtype="bool")
for (param_idx, ranges) in param_idx2ranges.items():
    param_mask = np.zeros((n_samples, n_locs), dtype="bool")
    for (min_val, max_val) in ranges:
        param_mask |= (min_val <= matrix[:, param_idx]) & (
            matrix[:, param_idx] <= max_val
        )

    final_mask &= param_mask

idxs = np.argwhere(final_mask)
print(matrix[idxs[:, 0], :, idxs[:, 1]])

Negating just involves applying the ~ operator where you need it.

airalcorn2
  • 156
  • 1
  • 2
  • 15