I have a 3-dimensional matrix
import numpy as np
matrix = np.random.randn(100,10,500)
- X - dimension : samples of data
- Y - dimension : parameter/variable type
- Z - dimension : location
In the above matrix, there are 100 samples for 10 variables at 500 locations. For example, if I index variable #1, then I have 100 samples of that variable (spanning 20 seconds) at each of the 500 locations.
I need to identify which samples of data and which locations match certain criteria. Matches are per matrix (indexing a particular variable/parameter) and would be in pairs, describing the sample and location of where the criteria matched. For example, a match for the above 3D matrix would be at sample = 50
and location = 31
. There could be multiple pairs of matches. I generally return this as an array
of tuples
, where each tuple
contains the sample and location number.
The criteria could specify one or more:
- Ranges of values : between -1.0 and 5.5 for example
- Individual values : value must == 1.39 for example
These ranges and individual values can be specified for one or more:
- Variables/parameters
For example:
- Parameter #1 (Y-index = 0) : (-1.0 to 5.5) or (10.3 to 12.1) or 20.32
- Parameter #5 (Y-index = 4) : 10.0 or (1.0 to 800.0)
- Parameter #8 (Y-index = 7) : (50.0 to 100.0)
Additionally, I would need the ability to invert the criteria, for example:
- Parameter #1 (Y-index = 0) : NOT ( (-1.0 to 5.5) or (10.3 to 12.1) or 20.32 )
I would need to have a list of tuples indicating the sample index (X-axis) and location index (Z-axis), where Parameter #1 and Parameter #5 and Parameter #8 match their conditions in the 3D matrix.
I've been looking at intersect1D
. I've also been using np.where
in a loop, which is very very inefficient, such as:
import numpy as np
matrix = np.random.randn(100,10,500)
net_array = None
for parameter in parameters:
total_result = None
for lower_range_value, upper_range_value in range_values[parameter]:
result = np.where( (matrix[:,parameter,:] >= lower_range_value) & (matrix[:,parameter,:] <= upper_range_value)
if result[0].size > 0:
if type(total_result) == type(None):
total_result = result
else:
concat_0 = np.concatenate( (total_result[0], result[0]) )
concat_1 = np.concatenate( (total_result[1], result[1]) )
total_result = (concat_0,concat_1)
for discrete_value in discrete_values[parameter]:
result = np.where( matrix[:,parameter,:] == threshold )
if result[0].size > 0:
if type(total_result) == type(None):
total_result = result
else:
concat_0 = np.concatenate( (total_result[0], result[0]) )
concat_1 = np.concatenate( (total_result[1], result[1]) )
total_result = (concat_0,concat_1)
if type(total_result) != type(None):
if type(net_array) == type(None):
net_array = np.stack( [ total_result[0] , total_result[1] ] , axis = -1)
else:
stacked_total_result = np.stack( [ total_result[0] , total_result[1] ] , axis=-1 )
match_indexes = (net_array[:,None] == stacked_total_result).all(-1).any(1)
net_array = net_array[match_indexes]
if np.any(match_indexes) == False:
break
Is there an efficient way of finding the sample index (X-axis) and location index (Z-axis) where one or more parameters (Y-axis) each match their criteria?