1

I have a dataset of nested lists, all of the same length (bigger than 120). I only need the latest 120 valid values. (so like rowwise moving up the NaN Values to the end and then selecting the last 120 is one way to do it).

How to do that efficiently? (because I got millions of these Samples)

[# Samples list
 [ # Sample 1
  [ 89.319787 1.329743 99.234670 ... 52.329743 0.319787 2.319787 ] 
  [ 84.319787 1.329743 49.329743 ... 52.329743    0.319 2.319787 ] 
  [ 12.319787      NaN 33.329743 ... 52.329743 0.319787 2.319787 ] 
  [ 33.319787 1.329743 23.329743 ... 52.329743 0.319787 2.319787 ] 
  ... 
  [ 23.319787 1.329743 45.234670 ... 52.329743  0.32721 2.319787 ] 
  [ 89.319787      NaN 99.234670 ...       NaN      NaN 2.319787 ] 
  [ 84.319787 1.329743 49.329743 ... 52.329743    0.319 2.319787 ] 
  [ 12.319787 1.329743       NaN ... 52.329743 0.319787 2.319787 ] 
  [ 33.319787 1.329743       NaN ... 52.329743      NaN 2.319787 ] 
                                                                  ],
 [ # Sample 2
  [ 89.319787 1.329743 99.234670 ... 52.329743 0.319787 2.319787 ] 
  [ 84.319787 1.329743 49.329743 ... 52.329743    0.319 2.319787 ] 
  [ 12.319787      NaN 33.329743 ... 52.329743 0.319787 2.319787 ] 
  [ 33.319787 1.329743 23.329743 ... 52.329743 0.319787 2.319787 ] 
  ... 
  [ 23.319787 1.329743 45.234670 ... 52.329743  0.32721 2.319787 ] 
  [ 89.319787      NaN 99.234670 ...       NaN      NaN 2.319787 ] 
  [ 84.319787 1.329743 49.329743 ... 52.329743    0.319 2.319787 ] 
  [ 12.319787 1.329743       NaN ... 52.329743 0.319787 2.319787 ] 
  [ 33.319787      NaN       NaN ... 52.329743      NaN 2.319787 ] 
                                                                  ],
[...],
[...],
[...],

 [ # Sample n
  [ 89.319787 1.329743 99.234670 ... 52.329743 0.319787 2.319787 ] 
  [       NaN 1.329743 49.329743 ... 52.329743    0.319 2.319787 ] 
  [ 12.319787      NaN 33.329743 ... 52.329743 0.319787 2.319787 ] 
  [ 33.319787 1.329743 23.329743 ... 52.329743 0.319787 2.319787 ] 
  ... 
  [ 23.319787 1.329743 45.234670 ... 52.329743  0.32721 2.319787 ] 
  [ 89.319787      NaN 99.234670 ...       NaN      NaN 2.319787 ] 
  [ 84.319787 1.329743 49.329743 ... 52.329743    0.319 2.319787 ] 
  [ 12.319787 1.329743       NaN ... 52.329743 0.319787 2.319787 ] 
  [ 33.319787 1.329743       NaN ... 52.329743      NaN 2.319787 ] 
                                                                  ]
]

expected outcome: (this is only one way it could be done, because I only need the latest 120 valid values in each vertical "column", while each vertical "column" necessarily always has more than 120 valid values) Note: since the examples only show 9 rows per sample and the vertical column with the least valid elements has 6 non-NaN elements, one can use 6 instead of 120 valids.

[# Samples list
 [ # Sample 1
  [ 89.319787      NaN       NaN ...       NaN      NaN 2.319787 ] 
  [ 84.319787      NaN       NaN ... 52.329743      NaN 2.319787 ] 
  [ 12.319787 1.329743 99.234670 ... 52.329743 0.319787 2.319787 ] 
  [ 33.319787 1.329743 49.329743 ... 52.329743    0.319 2.319787 ] 
  ... 
  [ 23.319787 1.329743 33.329743 ... 52.329743 0.319787 2.319787 ] 
  [ 89.319787 1.329743 23.329743 ... 52.329743 0.319787 2.319787 ] 
  [ 84.319787 1.329743 45.234670 ... 52.329743  0.32721 2.319787 ] 
  [ 12.319787 1.329743 49.329743 ... 52.329743    0.319 2.319787 ] 
  [ 33.319787 1.329743 99.234670 ... 52.329743 0.319787 2.319787 ] 
                                                                  ],
 [ # Sample 2
  [ 89.319787      NaN       NaN ...       NaN      NaN 2.319787 ] 
  [ 84.319787      NaN       NaN ... 52.329743      NaN 2.319787 ] 
  [ 12.319787      NaN 99.234670 ... 52.329743 0.319787 2.319787 ] 
  [ 33.319787 1.329743 49.329743 ... 52.329743    0.319 2.319787 ] 
  ... 
  [ 23.319787 1.329743 33.329743 ... 52.329743 0.319787 2.319787 ] 
  [ 89.319787 1.329743 23.329743 ... 52.329743 0.319787 2.319787 ] 
  [ 84.319787 1.329743 45.234670 ... 52.329743  0.32721 2.319787 ] 
  [ 12.319787 1.329743 49.329743 ... 52.329743    0.319 2.319787 ] 
  [ 33.319787 1.329743 99.234670 ... 52.329743 0.319787 2.319787 ] 
                                                                  ], 
[...],
[...],
[...],

 [ # Sample n
  [       NaN      NaN       NaN ...       NaN      NaN 2.319787 ] 
  [ 84.319787      NaN       NaN ... 52.329743      NaN 2.319787 ] 
  [ 12.319787      NaN 99.234670 ... 52.329743 0.319787 2.319787 ] 
  [ 33.319787 1.329743 49.329743 ... 52.329743    0.319 2.319787 ] 
  ... 
  [ 23.319787 1.329743 33.329743 ... 52.329743 0.319787 2.319787 ] 
  [ 89.319787 1.329743 23.329743 ... 52.329743 0.319787 2.319787 ] 
  [ 84.319787 1.329743 45.234670 ... 52.329743  0.32721 2.319787 ] 
  [ 12.319787 1.329743 49.329743 ... 52.329743    0.319 2.319787 ] 
  [ 33.319787 1.329743 99.234670 ... 52.329743 0.319787 2.319787 ] 
                                                                  ],
]

Sample Data:

samples = [[
[89.319787,1.329743,99.234670,52.329743,0.319787,2.319787],
[84.319787,1.329743,49.329743,52.329743,0.319,2.319787],
[12.319787,np.nan,33.329743,52.329743,0.319787,2.319787],
[33.319787,1.329743,23.329743,52.329743,0.319787,2.319787],
[23.319787,1.329743,45.234670,52.329743,0.32721,2.319787],
[89.319787,np.nan,99.234670,np.nan,np.nan,2.319787],
[84.319787,1.329743,49.329743,52.329743,0.319,2.319787],
[12.319787,1.329743,np.nan,52.329743,0.319787,2.319787],
[33.319787,1.329743,np.nan,52.329743,np.nan,2.319787]
],
[
[89.319787,1.329743,99.234670,52.329743,0.319787,2.319787],
[84.319787,1.329743,49.329743,52.329743,0.319,2.319787],
[12.319787,np.nan,33.329743,52.329743,0.319787,2.319787],
[33.319787,1.329743,23.329743,52.329743,0.319787,2.319787],
[23.319787,1.329743,45.234670,52.329743,0.32721,2.319787],
[89.319787,np.nan,99.234670,np.nan,np.nan,2.319787],
[84.319787,1.329743,49.329743,52.329743,0.319,2.319787],
[12.319787,1.329743,np.nan,52.329743,0.319787,2.319787],
[33.319787,np.nan,np.nan,52.329743,np.nan,2.319787]
],
[
[89.319787,1.329743,99.234670,52.329743,0.319787,2.319787],
[np.nan,1.329743,49.329743,52.329743,0.319,2.319787],
[12.319787,np.nan,33.329743,52.329743,0.319787,2.319787],
[33.319787,1.329743,23.329743,52.329743,0.319787,2.319787],
[23.319787,1.329743,45.234670,52.329743,0.32721,2.319787],
[89.319787,np.nan,99.234670,np.nan,np.nan,2.319787],
[84.319787,1.329743,49.329743,52.329743,0.319,2.319787],
[12.319787,1.329743,np.nan,52.329743,0.319787,2.319787],
[33.319787,1.329743,np.nan,52.329743,np.nan,2.319787]]]
  • 1
    This should help - [`Python: Justifying NumPy array`](https://stackoverflow.com/questions/44558215/python-justifying-numpy-array). – Divakar Aug 22 '20 at 09:35
  • looks interesting, is there a way to do this in a vectorized way ? (all samples at the 'same' time), (because otherwise iteration over millions of samples) – La-Li-Lu-Le-Lo Aug 22 '20 at 09:41

0 Answers0