Suppose I have an array (M,N) where the values in each "column", N, represent data recordings of N different machines. Let's also imagine each "row", M, represents a unique "timestamp" where data was recorded for all of the N machines.
The array (M,N) is structured in a way so that at M = 0, this would corresp[ond to the very first "timestamp" (t0) and the row M = M (tm) represents the most recent "timestamp" recording.
Let's call this array "AX." AX[0] would yield the recorded data for N machines at the very 1st "timestamp". AX[-1] would be the most recent recordings.
Here is my array:
>>AX = np.random.randn(3, 5)
array([[ 0.53826804, -0.9450442 , -0.10279278, 0.47251871, 0.32050493],
[-0.97573464, -0.42359652, -0.00223274, 0.7364234 , 0.83810714],
[-0.07626913, 0.85246932, -0.13736392, -1.39977431, -1.39882156]])
Now imagine something went wrong and data wasn't captured consistently for every machine at every "timestamp". To create an example of what the output might look like I followed the example linked below to insert Nans in random positions in the array:
Create sample numpy array with randomly placed NaNs
>>AX.ravel()[np.random.choice(AX.size, 9, replace=False)] = np.nan
array([[ 0.53826804, -0.9450442 , nan, 0.47251871, nan],
[ nan, nan, nan, 0.7364234 , 0.83810714],
[-0.07626913, nan, nan, nan, nan]])
Let's assume that I need to provide the most recent values of the recorded data. Ideally this would be as easy as referencing AX[-1]. In this particular case, I would hardly have any data since everything got screwed up.
>>AX[-1]
array([-0.07626913, nan, nan, nan, nan])
GOAL:
I realize any data is better than nothing, so I would like use the most recent value recorded for each machine. In this particular scenario, the best I could is provide an array with the values:
[-0.07626913, -0.9450442, 0.7364234, 0.83810714]
Notice column 2 of AX had no usable data, so I just skipped it's ouput.
I do not find np.arrays to be very intuitive and as I read through the documentation, I am overwhelmed by the amount of specialized functions and transforms.
My intial idea was to perhaps filter out all of the Nans to a new array (AY), and then take the last row AY[-1] (assuming this would retains its important row based ordering) Then I realized that this would be making an array with a strange shape of (I'm just using integer values here for convenience instead of AX's values):
[1,2,3],
[4,5],
[6]
Assuming that is even possible to create, taking the last "row"(?) would yield [6,5,3] and would totally mess everything up. Padding an array with values is also bad because the most recent values would be pads for 4 out of 5 data points in the most recent "timestamp" row.
Is there a way to achieve what I want in a fairly painless manner while still using the np.array stucture and avoiding dataframes and panels?
Thanks!