Fast way to find the data centralization area in array

Question

Suppose I have the following data:

[    0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0   255     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0   255  5610  2550  2805  5355  6375  6120 10965 11220
 10200 10200 10455 16575 23205 27795 28815 31875 32385 35190 37995 46410
 45645 48195 53295 53805 52020 53040 46665 47685 44625 44625 37740 32895
 27285 29580 36210 39780 47430 49215 53805 54825 58905 60435 60180 62220
 58650 59670 61965 64260 62730 69360 69360 67065 68340 68085 68340 62985
 66555 64260 61710 66555 72420 71910 71910 71910 71145 70890 67065 65280
 67065 64005 60690 64005 54825 55335 48960 43095 44115 46410 40800 42330
 37995 52275 56355 60945 68850 70635 69870 71910 73185 75735 78540 77010
 81090 77775 78285 77265 76245 72165 77520 72930 73185 72675 66555 66045
 66045 66300 63495 63240 59925 56610 56355 52275 56355 49470 45135 45900
 43095 40290 46665 61710 66300 65790 67830 72675 75735 73440 75480 77010
 77265 81600 81600 79050 81600 84150 82875 81600 82875 73950 76755 65790
 62475 66810 63495 65790 64260 70380 73185 69615 70635 71400 66045 61710
 53805 48195 46155 47175 47175 48960 48195 49725 45135 37995 35445 32640
 30600 28560 20145  9690 12495 11730 14280 14025 13770 13005 15045 12240
  9690  7395  5100  4335  3315  1785  2295  1530   510   255     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0   255     0     0     0     0   255
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0   255     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0   255     0     0     0     0     0     0     0
     0     0     0   255     0     0   255     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0   255   255     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0   255     0     0     0     0     0
     0     0     0     0     0     0     0     0     0   255     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0
     0     0     0     0     0     0     0     0     0     0     0     0]

And then the line chat look like following:

I would like to find the area of array which contain many data. Is there any possibe method can faster to find the result rather than using brute force?

OR

I use np.nonzero(x) to produce the following index result:

(array([ 57, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158,
        159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171,
        172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184,
        185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197,
        198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210,
        211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223,
        224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236,
        237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249,
        250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262,
        263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275,
        276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288,
        289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301,
        302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314,
        315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327,
        328, 329, 330, 331, 332, 333, 366, 371, 439, 460, 471, 474, 524,
        525, 546, 561], dtype=int64),)

How to group nerest data (which contain non zero) into same area range. For example: 57, 147-474, 524-561

Are you trying to find the area maximized by a fixed-sized window? Perhaps finding the maximum of a convolution with a fixed window is your best bet — jerpint, Jun 20 '18 at 03:52
No, I want to cluster data actually. That is, I want to group nerest index which contain non zero value. — SinLok, Jun 20 '18 at 03:59
Probably , you are looking for the answer to this question. https://stackoverflow.com/questions/7352684/how-to-find-the-groups-of-consecutive-elements-from-an-array-in-numpy — Siva-Sg, Jun 20 '18 at 06:04

AGN Gazer · Accepted Answer · 2018-06-20T14:32:03.200

0

Try this:

import numpy as np
idx = np.flatnonzero(np.logical_not(np.pad(data, 1, 'constant', constant_values=0)))
# OR:
# idx = np.flatnonzero(np.logical_not([0] + list(data) + [0]))
k = np.argmax(np.ediff1d(idx))
data[idx[k]:idx[k+1]-1]

edited Jun 20 '18 at 14:32

answered Jun 20 '18 at 14:07

AGN Gazer

8,025
2
27
45

Fast way to find the data centralization area in array

1 Answers1