Binning then sorting arrays in each bin but keeping their indices together

Question

I have two arrays and the indices of these arrays are related. So x[0] is related to y[0], so they need to stay organized. I have binned the x array into two bins as shown in the code below.

x = [1,4,7,0,5]
y = [.1,.7,.6,.8,.3]

binx = [0,4,9]
index = np.digitize(x,binx)

Giving me the following:

In [1]: index
Out[1]: array([1, 2, 2, 1, 2])

So far so good. (I think)

The y array is a parameter telling me how well measured the x data point is, so .9 is better than .2, so I'm using the next code to sort out the best of the y array:

y.sort() 
ysorted = y[int(len(y) * .5):]

which gives me:

In [2]: ysorted
Out[2]: [0.6, 0.7, 0.8]

giving me the last 50% of the array. Again, this is what I want.

My question is how do I combine these two operations? From each bin, I need to get the best 50% and put these new values into a new x and new y array. Again, keeping the indices of each array organized. Or is there an easier way to do this? I hope this makes sense.

Possible duplicate of [Is it possible to sort two lists(which reference each other) in the exact same way?](http://stackoverflow.com/questions/9764298/is-it-possible-to-sort-two-listswhich-reference-each-other-in-the-exact-same-w) — ephemient, Mar 12 '17 at 21:20

Andrew Che · Answer 1 · 2017-03-12T20:41:49.617

0

You should make a list of pairs from your x and y lists

It can be achieved with the zip function:

x = [1,4,7,0,5]
y = [.1,.7,.6,.8,.3]
values = zip(x, y)
values
[(1, 0.1), (4, 0.7), (7, 0.6), (0, 0.8), (5, 0.3)]

To sort such a list of pairs by a specific element of each pair you may use the sort's key parameter:

values.sort(key=lambda pair: pair[1])
[(1, 0.1), (5, 0.3), (7, 0.6), (4, 0.7), (0, 0.8)]

Then you may do whatever you want with this sorted list of pairs.

edited Mar 12 '17 at 20:41

answered Mar 12 '17 at 20:36

Andrew Che

928
7
20

So, that's really cool! But how do I access just the x-component in the list of pairs so that I can sort them into bins? – dontdeimos Mar 12 '17 at 22:14
values[i][0] is the x-component, values[i][1] is the y one. Those "pairs" as I called them are tuples which work very much like lists – Andrew Che Mar 12 '17 at 22:19
So, index = np.digitize(values[i][0],binx) should work? It doesn't, what am I doing wrong? – dontdeimos Mar 12 '17 at 22:23
Not very familiar with numpy. What exactly do you need to pass to np.digitize as a first parameter? – Andrew Che Mar 12 '17 at 22:32
With my code, you have the list of pairs. If you need to pass to np.digitize the x list sorted by the accuracy being saved in the y list, you need to retreive x values from the list of pairs, like [pair[0] for pair in values] – Andrew Che Mar 12 '17 at 22:36
I didn't completely understand your problem, just noticed that you need to sort values in pairs and came up with the solution :) A dull question: if you need to pass the x array to np.digitize, what's your question about? Or you need to get it sorted by the accuracy before invoking np.digitize? – Andrew Che Mar 12 '17 at 22:43
I needed to sort into bins using np.digitize and then sort by the accuracy in each bin. All while the indices are organized. – dontdeimos Mar 12 '17 at 22:47
Do you still heed help? – Andrew Che Mar 12 '17 at 22:48
Is it possible to put the tuple into np.digitize or any binning function and only sort by the x component? Before sorting by accuracy. – dontdeimos Mar 12 '17 at 22:51
You want to keep the accuracy attached to x values while binning? That's a good question. Probably needs a separate SO question. I personally don't know – Andrew Che Mar 12 '17 at 23:05

score 0 · Accepted Answer · answered Mar 13 '17 at 02:11

Many numpy functions have arg... variants that don't operate "by value" but rather "by index". In your case argsort does what you want:

order = np.argsort(y)
# order is an array of indices such that
# y[order] is sorted
top50 = order[len(order) // 2 :]
top50x = x[top50]
# now top50x are the x corresponding 1-to-1 to the 50% best y

Binning then sorting arrays in each bin but keeping their indices together

2 Answers2