1

Problem Formulation

  • Suppose I have several 10000*10000 grids (can be transformed to 10000*10000 grayscale images. I would regard image and grid as the same below), and at each grid-point, there is some value (in my case it's the number of copies of a specific gene expressed at that pixel location, note that the locations are same for every grid). What I want is to quantify the similarity between two 2D spatial point-patterns of this kind (i.e., the spatial expression patterns of two distinct genes), and rank all pairs of genes in a "most similar" to "most dissimilar" manner. Note that it is not the spatial pattern in terms of the absolute value of expression level that I care about, rather, it's the relative pattern that I care about. As a result, I might need to utilize some correlation instead of distance metrics when comparing corresponding pixels.

  • The easiest method might be directly viewing all pixels together as a vector and calculate some correlation metric between the two vectors. However, this does not take the spatial information into account. Those genes that I am most interested in have spatial patterns, i.e., clustering and autocorrelation effects their expression pattern (though their "cluster" might take a very thin shape rather than sticking together, e.g., genes specific to the skin cells), which means usually the image would have several peak local regions, while expression levels at other pixels would be extremely low (near 0).

Possible Directions

  • I am not exactly sure if I should (1) consider applying image similarity comparison algorithms from image processing that take local structure similarity into account (e.g., SSIM, SIFT, as outlined in Simple and fast method to compare images for similarity), or (2) consider applying spatial similarity comparison algorithms from spatial statistics in GIS (there are some papers about this, but I am not sure if there are some algorithms dealing with simple point data rather than the normal region data with shape (in a more GIS-sense way, I need to find an algorithm dealing with raster data rather than polygon data)), or (3) consider directly applying statistical methods that deal with discrete 2D distributions, which I think might be a bit crude (seems to disregard the regional clustering/autocorrelation effects, ~ Tobler's First Law of Geography).

  • For direction (1), I am thinking about a simple method, that is, first find some "peak" regions in the two images respectively and regard their union as ROIs, and then compare those ROIs in the two images specifically in a simple pixel-by-pixel way (regard them together as a vector), but I am not sure if I can replace the distance metrics with correlation metrics, and am a bit worried that many methods of similarity comparison in image processing might not work well when the two images are dissimilar. For direction (2), I think this direction might be more appropriate because this problem is indeed related to spatial statistics, but I do not yet know where to start in GIS. I guess direction (3) is somewhat masked by (2), so I might not consider it here.

Sample

Sample image: (There are some issues w/ my own data, so here I borrowed an image from SpatialLIBD http://research.libd.org/spatialLIBD/reference/sce_image_grid_gene.html)

Sample

Let's say the value at each pixel is discretely valued between 0 and 10 (could be scaled to [0,1] if needed). The shapes of tissues in the right and left subfigure are a bit different, but in my case they are exactly the same.

PS: There is one might-be-serious problem regarding spatial statistics though. The expression of certain marker genes of a specific cell type might not be clustered in a bulk, but in the shape of a thin layer or irregularly. For example, if the grid is a section of the brain, then the high-expression peak region for cortex layer-specific genes (e.g., Ctip2 for layer V) might form a thin arc curved layer in the 10000*10000 grid.

UPDATE: I found a method belonging to the (3) direction called "optimal transport" problem that might be useful. Looks like it integrates locality information into the comparison of distribution. Would try to test this way (seems to be the easiest to code among all three directions?) tomorrow.

Any thoughts would be greatly appreciated!

user48867
  • 141
  • 1
  • 9

1 Answers1

2

In the absence of any sample image, I am assuming that your problem is similar to texture-pattern recognition.

We can start with Local Binary Patterns (2002), or LBPs for short. Unlike previous (1973) texture features that compute a global representation of texture based on the Gray Level Co-occurrence Matrix, LBPs instead compute a local representation of texture by comparing each pixel with its surrounding neighborhood of pixels. For each pixel in the image, we select a neighborhood of size r (to handle variable neighborhood sizes) surrounding the center pixel. A LBP value is then calculated for this center pixel and stored in the output 2D array with the same width and height as the input image. Then you can calculate a histogram of LBP codes (as final feature vector) and apply machine learning for classifications.

LBP implementations can be found in both the scikit-image and OpenCV but latter's implementation is strictly in the context of face recognition — the underlying LBP extractor is not exposed for raw LBP histogram computation. The scikit-image implementation of LBPs offer more control of the types of LBP histograms you want to generate. Furthermore, the scikit-image implementation also includes variants of LBPs that improve rotation and grayscale invariance.

Some starter code:

from skimage import feature
import numpy as np
from sklearn.svm import LinearSVC
from imutils import paths
import cv2
import os

class LocalBinaryPatterns:
    def __init__(self, numPoints, radius):
        # store the number of points and radius
        self.numPoints = numPoints
        self.radius = radius
    def describe(self, image, eps=1e-7):
        # compute the Local Binary Pattern representation
        # of the image, and then use the LBP representation
        # to build the histogram of patterns
        lbp = feature.local_binary_pattern(image, self.numPoints,
            self.radius, method="uniform")
        (hist, _) = np.histogram(lbp.ravel(),
            bins=np.arange(0, self.numPoints + 3),
            range=(0, self.numPoints + 2))
        # normalize the histogram
        hist = hist.astype("float")
        hist /= (hist.sum() + eps)
        # return the histogram of Local Binary Patterns
        return hist

# initialize the local binary patterns descriptor along with
# the data and label lists
desc = LocalBinaryPatterns(24, 8)
data = []
labels = []

# loop over the training images
for imagePath in paths.list_images(args["training"]):
    # load the image, convert it to grayscale, and describe it
    image = cv2.imread(imagePath)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    hist = desc.describe(gray)
    # extract the label from the image path, then update the
    # label and data lists
    labels.append(imagePath.split(os.path.sep)[-2])
    data.append(hist)

# train a Linear SVM on the data
model = LinearSVC(C=100.0, random_state=42)
model.fit(data, labels)

Once our Linear SVM is trained, we can use it to classify subsequent texture images:

# loop over the testing images
for imagePath in paths.list_images(args["testing"]):
    # load the image, convert it to grayscale, describe it,
    # and classify it
    image = cv2.imread(imagePath)
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    hist = desc.describe(gray)
    prediction = model.predict(hist.reshape(1, -1))
    
    # display the image and the prediction
    cv2.putText(image, prediction[0], (10, 30), cv2.FONT_HERSHEY_SIMPLEX,
        1.0, (0, 0, 255), 3)
    cv2.imshow("Image", image)
    cv2.waitKey(0)

Have a look at this excellent tutorial for more details.

Ravi Kumar (2016) was able to extract more finely textured images by combining LBP with Gabor filters to filter the coefficients of LBP pattern

Abhi25t
  • 3,703
  • 3
  • 19
  • 32
  • Thanks for your great answer! I've edited the question and a sample image is displayed. One specific characteristic of my problem is that I do not care about "rotation" or "scaling (size)", which is an important part of ordinary image comparison I suppose. In my case, a rotated similarity should not be considered as a similarity. I'm not completely sure if the algorithm described in the Ojala et al. (2002) paper could still be applicable after a brief look at their figures. – user48867 Jan 27 '21 at 10:04