0

I have a scatter graph with 65 points. I want to group these points in 36 rectangular lists. This corresponds to dividing the region of the space in which the graph is plotted into 36 regions. Is there a "pythonic" way to do this, without creating the 36 conditionals?

For example, for simplification, the x and y coordinates exhibited bellow, contains 20 points. Is there a simple way to divide them into 10 regions of equal size?

x = [484248.77, 481335.51, 473814.14, 488522.14, 481703.17, 479105.54, 480700.85, 482816.02, 484579.26, 483984.83, 483278.12, 473877.12, 484711.57, 481574.8, 484374.02, 483920.51, 484318.97, 482229.34, 481458.91, 487751.09]

y = [7421919.17, 7417638.85, 7426640.34, 7420657.74, 7423742.49, 7422636.23, 7422958.38, 7422550.7, 7421886.44, 7421707.53, 7415756.43, 7424344.33, 7422787.38, 7418556.75, 7420368.91, 7421946.9, 7419293.06, 7424612.41, 7427565.78, 7405473.74]

donut
  • 628
  • 2
  • 9
  • 23

1 Answers1

2

You can use np.digitize and this Q&A. You'll have to decide yourself how to distribute the bins, i.e 10x1 or 5x2 or 2x5 or 1x10.

import numpy as np
from scipy import sparse

def sort_to_bins_sparse(idx, data, mx=-1):
    if mx==-1:
        mx = idx.max() + 1    
    aux = sparse.csr_matrix((data, idx, np.arange(len(idx)+1)), (len(idx), mx)).tocsc()
    return np.split(aux.data, aux.indptr[1:]), \
        np.split(aux.indices, aux.indptr[1:])

def bin(data, bincounts):
    data = np.asanyarray(data)
    idx = [np.digitize(d, np.linspace(d.min(), d.max(), b, endpoint=False))-1
           for d, b in zip(data, bincounts)]
    flat = np.ravel_multi_index(idx, bincounts)
    _, idx = sort_to_bins_sparse(flat, data[0])
    return [data[:,i] for i in idx]

x = [484248.77, 481335.51, 473814.14, 488522.14, 481703.17, 479105.54, 480700.85, 482816.02, 484579.26, 483984.83, 483278.12, 473877.12, 484711.57, 481574.8, 484374.02, 483920.51, 484318.97, 482229.34, 481458.91, 487751.09]

y = [7421919.17, 7417638.85, 7426640.34, 7420657.74, 7423742.49, 7422636.23, 7422958.38, 7422550.7, 7421886.44, 7421707.53, 7415756.43, 7424344.33, 7422787.38, 7418556.75, 7420368.91, 7421946.9, 7419293.06, 7424612.41, 7427565.78, 7405473.74]

print(bin((x,y),(5,2)))

Output:

[array([], shape=(2, 0), dtype=float64), array([[ 473814.14,  473877.12],
       [7426640.34, 7424344.33]]), array([], shape=(2, 0), dtype=float64), array([[ 479105.54],
       [7422636.23]]), array([], shape=(2, 0), dtype=float64), array([[ 481335.51,  481703.17,  480700.85,  481574.8 ,  482229.34,
         481458.91],
       [7417638.85, 7423742.49, 7422958.38, 7418556.75, 7424612.41,
        7427565.78]]), array([[ 483278.12],
       [7415756.43]]), array([[ 484248.77,  482816.02,  484579.26,  483984.83,  484711.57,
         484374.02,  483920.51,  484318.97],
       [7421919.17, 7422550.7 , 7421886.44, 7421707.53, 7422787.38,
        7420368.91, 7421946.9 , 7419293.06]]), array([[ 487751.09],
       [7405473.74]]), array([[ 488522.14],
       [7420657.74]]), array([], shape=(2, 0), dtype=float64)]
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99
  • Thank you, this is exactly what I needed. – donut Jul 17 '19 at 02:26
  • One last question, is it possible to find the adjacency between these regions? For example, position 0 is a neighbour of positions 3 and 7, etc. – donut Jul 17 '19 at 06:41
  • 1
    You can use `np.unravel_index`. So in the example you would do `np.unravel_index([0,3,7], (5,2))` which gives [0 1 3] and [0 1 1], from which you see that 0 and 3 are diagonal neighbors while 7 is not connected to the other two. – Paul Panzer Jul 17 '19 at 07:40