I have a dataset of nested lists, all of the same length (bigger than 120). I only need the latest 120 valid values. (so like rowwise moving up the NaN Values to the end and then selecting the last 120 is one way to do it).
How to do that efficiently? (because I got millions of these Samples)
[# Samples list
[ # Sample 1
[ 89.319787 1.329743 99.234670 ... 52.329743 0.319787 2.319787 ]
[ 84.319787 1.329743 49.329743 ... 52.329743 0.319 2.319787 ]
[ 12.319787 NaN 33.329743 ... 52.329743 0.319787 2.319787 ]
[ 33.319787 1.329743 23.329743 ... 52.329743 0.319787 2.319787 ]
...
[ 23.319787 1.329743 45.234670 ... 52.329743 0.32721 2.319787 ]
[ 89.319787 NaN 99.234670 ... NaN NaN 2.319787 ]
[ 84.319787 1.329743 49.329743 ... 52.329743 0.319 2.319787 ]
[ 12.319787 1.329743 NaN ... 52.329743 0.319787 2.319787 ]
[ 33.319787 1.329743 NaN ... 52.329743 NaN 2.319787 ]
],
[ # Sample 2
[ 89.319787 1.329743 99.234670 ... 52.329743 0.319787 2.319787 ]
[ 84.319787 1.329743 49.329743 ... 52.329743 0.319 2.319787 ]
[ 12.319787 NaN 33.329743 ... 52.329743 0.319787 2.319787 ]
[ 33.319787 1.329743 23.329743 ... 52.329743 0.319787 2.319787 ]
...
[ 23.319787 1.329743 45.234670 ... 52.329743 0.32721 2.319787 ]
[ 89.319787 NaN 99.234670 ... NaN NaN 2.319787 ]
[ 84.319787 1.329743 49.329743 ... 52.329743 0.319 2.319787 ]
[ 12.319787 1.329743 NaN ... 52.329743 0.319787 2.319787 ]
[ 33.319787 NaN NaN ... 52.329743 NaN 2.319787 ]
],
[...],
[...],
[...],
[ # Sample n
[ 89.319787 1.329743 99.234670 ... 52.329743 0.319787 2.319787 ]
[ NaN 1.329743 49.329743 ... 52.329743 0.319 2.319787 ]
[ 12.319787 NaN 33.329743 ... 52.329743 0.319787 2.319787 ]
[ 33.319787 1.329743 23.329743 ... 52.329743 0.319787 2.319787 ]
...
[ 23.319787 1.329743 45.234670 ... 52.329743 0.32721 2.319787 ]
[ 89.319787 NaN 99.234670 ... NaN NaN 2.319787 ]
[ 84.319787 1.329743 49.329743 ... 52.329743 0.319 2.319787 ]
[ 12.319787 1.329743 NaN ... 52.329743 0.319787 2.319787 ]
[ 33.319787 1.329743 NaN ... 52.329743 NaN 2.319787 ]
]
]
expected outcome: (this is only one way it could be done, because I only need the latest 120 valid values in each vertical "column", while each vertical "column" necessarily always has more than 120 valid values) Note: since the examples only show 9 rows per sample and the vertical column with the least valid elements has 6 non-NaN elements, one can use 6 instead of 120 valids.
[# Samples list
[ # Sample 1
[ 89.319787 NaN NaN ... NaN NaN 2.319787 ]
[ 84.319787 NaN NaN ... 52.329743 NaN 2.319787 ]
[ 12.319787 1.329743 99.234670 ... 52.329743 0.319787 2.319787 ]
[ 33.319787 1.329743 49.329743 ... 52.329743 0.319 2.319787 ]
...
[ 23.319787 1.329743 33.329743 ... 52.329743 0.319787 2.319787 ]
[ 89.319787 1.329743 23.329743 ... 52.329743 0.319787 2.319787 ]
[ 84.319787 1.329743 45.234670 ... 52.329743 0.32721 2.319787 ]
[ 12.319787 1.329743 49.329743 ... 52.329743 0.319 2.319787 ]
[ 33.319787 1.329743 99.234670 ... 52.329743 0.319787 2.319787 ]
],
[ # Sample 2
[ 89.319787 NaN NaN ... NaN NaN 2.319787 ]
[ 84.319787 NaN NaN ... 52.329743 NaN 2.319787 ]
[ 12.319787 NaN 99.234670 ... 52.329743 0.319787 2.319787 ]
[ 33.319787 1.329743 49.329743 ... 52.329743 0.319 2.319787 ]
...
[ 23.319787 1.329743 33.329743 ... 52.329743 0.319787 2.319787 ]
[ 89.319787 1.329743 23.329743 ... 52.329743 0.319787 2.319787 ]
[ 84.319787 1.329743 45.234670 ... 52.329743 0.32721 2.319787 ]
[ 12.319787 1.329743 49.329743 ... 52.329743 0.319 2.319787 ]
[ 33.319787 1.329743 99.234670 ... 52.329743 0.319787 2.319787 ]
],
[...],
[...],
[...],
[ # Sample n
[ NaN NaN NaN ... NaN NaN 2.319787 ]
[ 84.319787 NaN NaN ... 52.329743 NaN 2.319787 ]
[ 12.319787 NaN 99.234670 ... 52.329743 0.319787 2.319787 ]
[ 33.319787 1.329743 49.329743 ... 52.329743 0.319 2.319787 ]
...
[ 23.319787 1.329743 33.329743 ... 52.329743 0.319787 2.319787 ]
[ 89.319787 1.329743 23.329743 ... 52.329743 0.319787 2.319787 ]
[ 84.319787 1.329743 45.234670 ... 52.329743 0.32721 2.319787 ]
[ 12.319787 1.329743 49.329743 ... 52.329743 0.319 2.319787 ]
[ 33.319787 1.329743 99.234670 ... 52.329743 0.319787 2.319787 ]
],
]
Sample Data:
samples = [[
[89.319787,1.329743,99.234670,52.329743,0.319787,2.319787],
[84.319787,1.329743,49.329743,52.329743,0.319,2.319787],
[12.319787,np.nan,33.329743,52.329743,0.319787,2.319787],
[33.319787,1.329743,23.329743,52.329743,0.319787,2.319787],
[23.319787,1.329743,45.234670,52.329743,0.32721,2.319787],
[89.319787,np.nan,99.234670,np.nan,np.nan,2.319787],
[84.319787,1.329743,49.329743,52.329743,0.319,2.319787],
[12.319787,1.329743,np.nan,52.329743,0.319787,2.319787],
[33.319787,1.329743,np.nan,52.329743,np.nan,2.319787]
],
[
[89.319787,1.329743,99.234670,52.329743,0.319787,2.319787],
[84.319787,1.329743,49.329743,52.329743,0.319,2.319787],
[12.319787,np.nan,33.329743,52.329743,0.319787,2.319787],
[33.319787,1.329743,23.329743,52.329743,0.319787,2.319787],
[23.319787,1.329743,45.234670,52.329743,0.32721,2.319787],
[89.319787,np.nan,99.234670,np.nan,np.nan,2.319787],
[84.319787,1.329743,49.329743,52.329743,0.319,2.319787],
[12.319787,1.329743,np.nan,52.329743,0.319787,2.319787],
[33.319787,np.nan,np.nan,52.329743,np.nan,2.319787]
],
[
[89.319787,1.329743,99.234670,52.329743,0.319787,2.319787],
[np.nan,1.329743,49.329743,52.329743,0.319,2.319787],
[12.319787,np.nan,33.329743,52.329743,0.319787,2.319787],
[33.319787,1.329743,23.329743,52.329743,0.319787,2.319787],
[23.319787,1.329743,45.234670,52.329743,0.32721,2.319787],
[89.319787,np.nan,99.234670,np.nan,np.nan,2.319787],
[84.319787,1.329743,49.329743,52.329743,0.319,2.319787],
[12.319787,1.329743,np.nan,52.329743,0.319787,2.319787],
[33.319787,1.329743,np.nan,52.329743,np.nan,2.319787]]]