How to locate a particular "region" of values in a 2D numpy array?

Question

I am working with a 2D numpy array made of 101x101=10201 values. Such values are of float type and range from 0.0 to 1.0. The array has an X,Y coordinate system which originates in the top left corner: thus, position (0,0) is in the top left corner, while position (101,101) is in the bottom right corner.

This is how the 2D array looks like (just an excerpt):

X,Y,Value
0,0,0.482
0,1,0.49
0,2,0.496
0,3,0.495
0,4,0.49
0,5,0.489
0,6,0.5
0,7,0.504
0,8,0.494
0,9,0.485

I would like to be able to:

1) Count the number of regions of cells (see image below) which value exceeds a given threshold, say 0.3;

2) Determine the distance between the visual centers of such regions and the top left corner, which has coordinates (0,0).

How could this be done in Python 2.7?

This is a visual representation of a 2D array with 2 regions highlighted (the darker the color, the higher the value):

Do you have access to opencv module `cv2`? Such tasks basically show up in image processing domain? — Divakar, Nov 14 '15 at 09:50
Never heard of it. Could you please give me some more details? — FaCoffee, Nov 14 '15 at 09:50

score 13 · Accepted Answer · answered Nov 14 '15 at 13:28

13

You can find which pixels satisfy your cut-off using a simple boolean condition, then use scipy.ndimage.label and scipy.ndimage.center_of_mass to find the connected regions and compute their centers of mass:

import numpy as np
from scipy import ndimage
from matplotlib import pyplot as plt

# generate some lowpass-filtered noise as a test image
gen = np.random.RandomState(0)
img = gen.poisson(2, size=(512, 512))
img = ndimage.gaussian_filter(img.astype(np.double), (30, 30))
img -= img.min()
img /= img.max()

# use a boolean condition to find where pixel values are > 0.75
blobs = img > 0.75

# label connected regions that satisfy this condition
labels, nlabels = ndimage.label(blobs)

# find their centres of mass. in this case I'm weighting by the pixel values in
# `img`, but you could also pass the boolean values in `blobs` to compute the
# unweighted centroids.
r, c = np.vstack(ndimage.center_of_mass(img, labels, np.arange(nlabels) + 1)).T

# find their distances from the top-left corner
d = np.sqrt(r*r + c*c)

# plot
fig, ax = plt.subplots(1, 2, sharex=True, sharey=True, figsize=(10, 5))
ax[0].imshow(img)
ax[1].hold(True)
ax[1].imshow(np.ma.masked_array(labels, ~blobs), cmap=plt.cm.rainbow)
for ri, ci, di in zip(r, c, d):
    ax[1].annotate('', xy=(0, 0), xytext=(ci, ri),
                   arrowprops={'arrowstyle':'<-', 'shrinkA':0})
    ax[1].annotate('d=%.1f' % di, xy=(ci, ri),  xytext=(0, -5),
                   textcoords='offset points', ha='center', va='top',
                   fontsize='x-large')
for aa in ax.flat:
    aa.set_axis_off()
fig.tight_layout()
plt.show()

answered Nov 14 '15 at 13:28

ali_m

71,714
23
223
298

This is great! But rather than creating a test image, what if I have my own image, already worked out, as a `jpg` file in a given folder? How would the `blobs=img>0.75` change to account for my `jpg` images? Alternatively, I have as many `csv` files as there are `jpg` images I can work with. – FaCoffee Nov 14 '15 at 16:00
Getting the data into a 2D numpy array is a separate issue really. You could use [`scipy.ndimage.imread`](http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.misc.imread.html) to read a `.jpg` or [`numpy.genfromtxt`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html) to read a `.csv` file. In the latter case you will probably have to sort the `Value` column by ascending `Y` and `X` coordinates, then reshape it into a 2D array ([see here](http://stackoverflow.com/q/32129572/1461210)). – ali_m Nov 14 '15 at 16:09
By the way, `.jpg` is not a great choice of format for storing arrays because it uses [lossy compression](https://en.wikipedia.org/wiki/JPEG#JPEG_compression) - you will always lose come information about the exact pixel values in the process of saving the image. It's probably a better idea to save as numpy's native `.npy` format, or if you must be able to view the arrays as images, use a lossless format like `.png` or `.tif`. – ali_m Nov 14 '15 at 16:26
Ok, but I wonder if saving them as `png` for instance does require more memory on my hd. Consider that the number of `jpg` images I would need to recreate and turn into `png` is 1200. Btw, how would that line change if I had `png` images only? – FaCoffee Nov 14 '15 at 17:16
Perhaps you should ask a separate question about this. My point was that image formats are probably not the best way to store your data (although I would expect even a `.png` image to be much more efficient than a text-based format such as `.csv`). – ali_m Nov 14 '15 at 17:20
One question regarding the blobs you posted. What is the measurement unit for distance `d`? It is `cells`, right? With the maximum value being `512*sqrt(2)=724.08`? – FaCoffee Nov 14 '15 at 17:31
2

`r`, `c` and `d` are all measured in pixels. – ali_m Nov 14 '15 at 17:33
I think it all originates from `img = gen.poisson(2, size=(512, 512))`, right? – FaCoffee Nov 14 '15 at 17:34
1

Yes, in the sense that it generates a 512x512 array. `r` and `c` refer to row and column locations within that array. – ali_m Nov 14 '15 at 17:37
I have some problems in implementing this code, as I don't understand the `np.vstack()` function. As said, I don't need to generate test images as I already have mine. I was able to import them and transform each one of them into a 2D `numpy.ndarray`. Now, can I select regions of this array where certain thresholds are exceeded, rather than working on the image itself? – FaCoffee Nov 18 '15 at 13:12
What is `nlabels`? Is it used to distinguish regions based on their label? – FaCoffee Nov 18 '15 at 13:51
1

`nlabels` is the number of connected regions found by `label()`, such that `labels.max() == nlabels`. This is all explained [in the documentation for `scipy.ndimage.label`](http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.ndimage.measurements.label.html). – ali_m Nov 18 '15 at 13:54
One last question: what do you mean by `weighted centroids`? Does it refer to the measurement unit used for computing distance, which in my case is `pixels`? – FaCoffee Nov 18 '15 at 14:00
2

When computing the centre of mass, you could either treat all pixels that are above the threshold as equal (i.e. pixels are either 1 or 0), or you could take into account their actual values within the original image, such that pixels with greater values have a greater influence on the centre of mass for each region. In my code the two cases would be equivalent to passing either `blobs` or `img` as the first argument to `center_of_mass` respectively. – ali_m Nov 18 '15 at 14:07
So, if I use `img`, I guess my distance will be weighted. Correct? Btw, how does the thing work, since the output of `img=mymatrix>0.75` only contains True or False values? How can it distinguish which value weighs more than others? – FaCoffee Nov 18 '15 at 14:11
1

Look at my code again - my `img` variable is equivalent to your `mymatrix`, and my `blobs` is equivalent to your `img`. The comments are getting a bit out of hand, so I'm not going to answer any more questions here. – ali_m Nov 18 '15 at 14:18
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/95449/discussion-between-francesco-castellani-and-ali-m). – FaCoffee Nov 18 '15 at 14:32
@ali_m, the answer gave exiting results, hence the interest and more comments. – Gathide May 27 '16 at 10:17

How to locate a particular "region" of values in a 2D numpy array?

1 Answers1

Linked