Numpy Arrays: Extracting preferentially ordered values from array with Nans without padding?

Question

Suppose I have an array (M,N) where the values in each "column", N, represent data recordings of N different machines. Let's also imagine each "row", M, represents a unique "timestamp" where data was recorded for all of the N machines.

The array (M,N) is structured in a way so that at M = 0, this would corresp[ond to the very first "timestamp" (t0) and the row M = M (tm) represents the most recent "timestamp" recording.

Let's call this array "AX." AX[0] would yield the recorded data for N machines at the very 1st "timestamp". AX[-1] would be the most recent recordings.

Here is my array:

>>AX = np.random.randn(3, 5)

array([[ 0.53826804, -0.9450442 , -0.10279278,  0.47251871,  0.32050493],
       [-0.97573464, -0.42359652, -0.00223274,  0.7364234 ,  0.83810714],
       [-0.07626913,  0.85246932, -0.13736392, -1.39977431, -1.39882156]])

Now imagine something went wrong and data wasn't captured consistently for every machine at every "timestamp". To create an example of what the output might look like I followed the example linked below to insert Nans in random positions in the array:

Create sample numpy array with randomly placed NaNs

>>AX.ravel()[np.random.choice(AX.size, 9, replace=False)] = np.nan


array([[ 0.53826804, -0.9450442 ,         nan,  0.47251871,         nan],
       [        nan,         nan,         nan,  0.7364234 ,  0.83810714],
       [-0.07626913,         nan,         nan,         nan,         nan]])

Let's assume that I need to provide the most recent values of the recorded data. Ideally this would be as easy as referencing AX[-1]. In this particular case, I would hardly have any data since everything got screwed up.

>>AX[-1]

array([-0.07626913,         nan,         nan,         nan,         nan])

GOAL:

I realize any data is better than nothing, so I would like use the most recent value recorded for each machine. In this particular scenario, the best I could is provide an array with the values:

[-0.07626913, -0.9450442, 0.7364234, 0.83810714]

Notice column 2 of AX had no usable data, so I just skipped it's ouput.

I do not find np.arrays to be very intuitive and as I read through the documentation, I am overwhelmed by the amount of specialized functions and transforms.

My intial idea was to perhaps filter out all of the Nans to a new array (AY), and then take the last row AY[-1] (assuming this would retains its important row based ordering) Then I realized that this would be making an array with a strange shape of (I'm just using integer values here for convenience instead of AX's values):

[1,2,3],
[4,5],
[6]

Assuming that is even possible to create, taking the last "row"(?) would yield [6,5,3] and would totally mess everything up. Padding an array with values is also bad because the most recent values would be pads for 4 out of 5 data points in the most recent "timestamp" row.

Is there a way to achieve what I want in a fairly painless manner while still using the np.array stucture and avoiding dataframes and panels?

Thanks!

Warren Weckesser · Answer 1 · 2016-07-12T00:34:25.857

2

This is the kind of question that can generate many interesting answers. Someone will probably come up with a better way than this, but to get things started, here's one possibility:

In [99]: AX
Out[99]: 
array([[ 0.53826804, -0.9450442 ,         nan,  0.47251871,         nan],
       [        nan,         nan,         nan,  0.7364234 ,  0.83810714],
       [-0.07626913,         nan,         nan,         nan,         nan]])

np.isfinite(AX) is a boolean array that is True where AX is not nan (and not infinite, but I assume that case is not relevant). For a boolean array B, B.argmax(axis=0) gives the indices of the first True value in each column. To get the indices of the last True value, reverse the array, take the argmax, and then subtract the result from the number of rows minus 1; that is, B.shape[0]-1 - B[::-1].argmax(axis=0). In this case, B is np.isfinite(AX), so we have:

In [100]: k = AX.shape[0] - 1 - np.isfinite(AX)[::-1].argmax(axis=0)

k contains the row indices where the final values occur. There is one for each column, so the corresponding column indices are simply np.arange(AX.shape[1]).

In [101]: last_vals = AX[k, np.arange(AX.shape[1])]

last_vals is the one-dimensional array of the last non-nan values in each column, unless a column is all nan, in which case the value in last_vals is also nan:

In [102]: last_vals
Out[102]: array([-0.07626913, -0.9450442 ,         nan,  0.7364234 ,  0.83810714])

To eliminate the non-nan values in last_vals, you can index it with np.isfinite(last_vals):

In [103]: last_vals[np.isfinite(last_vals)]
Out[103]: array([-0.07626913, -0.9450442 ,  0.7364234 ,  0.83810714])

edited Jul 12 '16 at 00:34

answered Jul 12 '16 at 00:21

Warren Weckesser

110,654
19
194
214

This seems real ingenious! I'm gonna slowly walk myself through your steps manually to make sure I understand each part. There's a lot going on here :) – Jul 12 '16 at 00:58
I appreciate the "accept", but it was probably too soon. There are quite a few clever numpythonistas who keep an eye on the stackoverflow questions, and you are more likely to get a variety of answers if you wait awhile before accepting one. – Warren Weckesser Jul 12 '16 at 01:03
Well shucks, I'm pretty new to this site and didn't realize that was the motivation for getting responses. Do people continue to comment on accepted answers? – Jul 12 '16 at 01:07
Yes, they do. There is nothing wrong with accepting it, but some folks might not look too closely at a question if there is already an accepted answer. Waiting a day or so is probably a good idea. In this case, I feel like there's something that could be done to make this even simpler or more efficient, so I wouldn't mind seeing more answers myself. – Warren Weckesser Jul 12 '16 at 01:10
Yet, you did respond first with a working solution which was exactly what I was hoping for. Perhaps the nature of the question will entice the numpythonistas to bedazzle us with their solutions regardless of the question status? Or should I undo the accept(if possible) and fish for more? Anyway, thanks again. – Jul 12 '16 at 01:12
Well, I will undo it then just to see what happens, but to me, you win for even helping me tackle it and with such alacrity. – Jul 12 '16 at 01:13

Numpy Arrays: Extracting preferentially ordered values from array with Nans without padding?

1 Answers1