-3

I have a list or numpy array like this:

[[3,   2,   1,   2,   3,   3  ],
 [3.1, 2.2, 1.1, 2.1, 3.3, 3.2]]

based on the same first-row value, they should be grouped as following lists:

[1.1], [2.1,2.2], [3.1,3.2,3.3]

for each list above I want to:

sum(abs(list - avg_list))

Besides finding all 2nd-row values which have the same 1st-row value one by one and then process them, could there be a parallel-process solution?

What I've tried is as follows:

a = np.sort(a)
a_0 = np.unique(a[0,:])

result = []
for b in a_0:
  a_1 = np.extract(a[0,:]==b,a[1,:])
  result.append(np.sum(np.abs(a_1-np.mean(a_1))))
XJY95
  • 89
  • 6
  • What do you mean a parallel process solution? – Dani Mesejo Oct 11 '21 at 07:49
  • 1
    Possibly related: https://stackoverflow.com/a/43094244/6189984 – bartolja Oct 11 '21 at 07:50
  • 1
    Include what you have tried and how fast it is, my advise is first to measure – Dani Mesejo Oct 11 '21 at 07:55
  • @DaniMesejo I have added my current solution part. By parallel I mean the approach which could be free from using forloop – XJY95 Oct 11 '21 at 08:01
  • @bartolja Thanks! Seems like the following operations could not be without forloop? – XJY95 Oct 11 '21 at 08:03
  • 1
    Show _**actual code**_ of what you have tried, not the "concept of code" you _think_ you tried. _"What I've tried is using np.where with the condition = value of 1st row and using for loop to find these lists"_ does not meet that requirement. And there's no `3.2` in your original list but somehow shows up in place of `3.1`. – aneroid Oct 11 '21 at 08:20
  • @Sorry I have attached my code and correct the original example – XJY95 Oct 11 '21 at 16:34
  • I'm guessing your real concern is speed, not "parallel" perse, "vectorize" in numpy context normally means performing the task with compiled numpy methods, so you don't need to iterate in Python. But here you are collecting, at least as an intermediate step, lists (or arrays) that can very in length. That strongly indicates that a "pure" numpy approach isn't possible. To do a no-loops approach you have "think-outside-the-box". – hpaulj Oct 11 '21 at 17:59

1 Answers1

2

Here's a no-loop approach. I map the data onto a nan filled array using idx. Then use some of the np.nan... functions to perform the math in a way that excludes the nan.

In [102]: idx=np.array([3,   2,   1,   2,   3,   3  ])
In [103]: data=np.array([3.1, 2.2, 1.1, 2.1, 3.3, 3.2])
In [104]: res[np.arange(6),idx-1]=data
In [105]: res
Out[105]: 
array([[nan, nan, 3.1],
       [nan, 2.2, nan],
       [1.1, nan, nan],
       [nan, 2.1, nan],
       [nan, nan, 3.3],
       [nan, nan, 3.2]])
In [106]: np.nanmean(res, axis=0)
Out[106]: array([1.1 , 2.15, 3.2 ])
In [107]: res-np.nanmean(res, axis=0)
Out[107]: 
array([[           nan,            nan, -1.0000000e-01],
       [           nan,  5.0000000e-02,            nan],
       [ 0.0000000e+00,            nan,            nan],
       [           nan, -5.0000000e-02,            nan],
       [           nan,            nan,  1.0000000e-01],
       [           nan,            nan, -4.4408921e-16]])
In [108]: np.abs(res-np.nanmean(res, axis=0))
Out[108]: 
array([[          nan,           nan, 1.0000000e-01],
       [          nan, 5.0000000e-02,           nan],
       [0.0000000e+00,           nan,           nan],
       [          nan, 5.0000000e-02,           nan],
       [          nan,           nan, 1.0000000e-01],
       [          nan,           nan, 4.4408921e-16]])
In [109]: np.nansum(np.abs(res-np.nanmean(res, axis=0)), axis=0)
Out[109]: array([0. , 0.1, 0.2])

Mapping onto a 0 filled array might also work, since sum etc isn't bothered by excess 0s.

I can't vouch for the speed.

Your code with the missing result!

In [110]: a = np.sort(np.array((idx,data)))
     ...: a_0 = np.unique(a[0,:])
     ...: 
     ...: result = []
     ...: for b in a_0:
     ...:   a_1 = np.extract(a[0,:]==b,a[1,:])
     ...:   result.append(np.sum(np.abs(a_1-np.mean(a_1))))
In [111]: result
Out[111]: [0.0, 0.10000000000000009, 0.20000000000000018]
hpaulj
  • 221,503
  • 14
  • 230
  • 353