0

Let's say I have a huge 2D array looking schematically like this:

test = np.array([[0.1, 0.3, 0.5, 0.2, 5., np.nan, np.nan],
          [2., 0.8, 0.1, 3., 2.5, 0.9, np.nan]])

As it's huge, I want to merge entries along an axis, but that their sum should become at least bigger than a certain value, say 1 in this case. The merged entry should take the lowest index of the merged group of entries and the rest filled with NaN:

np.array([[1.1, 5., np.nan, np.nan, np.nan, np.nan, np.nan],
          [2.9, 3., 3.4, np.nan, np.nan, np.nan, np.nan]])

I know this is somehow possible by looping through one dimension of the array, assigning indices to a list, thresholding, merging and then padding, but this seems rather complicated to me. I tried also to use np.apply_along_axis with something like this:

digi = np.digitize(test[0], np.arange(0, np.nanmax(test[0]), 0.05), right=True)
np.bincount(digi, weights=test[0])

according to answers here and here, but the result is also just loosely related to what I want. Is there a simpler way to formulate this?

user3017048
  • 2,711
  • 3
  • 22
  • 32
  • `apply_along_axis` makes iteration over many dimensions easier, but not faster. You only need to iterate on one dimension, so `apply_along` won't even help with the simplification. – hpaulj Oct 08 '18 at 16:48
  • I would love to use index arrays for it in order to avoid going through the array and apply a function over an axis, but I'm not sure how to solve the issue that e.g. a sum over an axis would have results of different length... – user3017048 Oct 09 '18 at 07:26
  • First get this working with a row by row iteration. – hpaulj Oct 09 '18 at 07:50

0 Answers0