How to combine split runs of spectral clustering for a huge affinity matrix

Question

Leading up to the question

I have a 2D complex valued image with a short series of values. I want to cluster similar pixels / segment the image. There is one more or less static image with a superimposed image that has some blobs in it that have a changing value (mostly the angle of the complex number) over the short series. They are also slightly discernable in the norm of the image.

My first attempt was k-means, but that really clustered according to the means (there is a distinction in mean values, especially compared to surrounding pixels, but the temporal and angular information is greater). My second attempt was ICA and then look at the k components with the largest magnitude, and that did successfully identify certain regions in my image as being different, but did not identify the group of pixels I was interested in (visually it is not hard to recognize them, but they are small).

Current situation

So because my first two tries did not work out, I looked around with google and it seemed spectral clustering might be appropriate. But I have some serious issues when using the method, mostly to do with limited available memory. I then thought, since I have so many pixels, I can just apply spectral clustering to seperate slabs of the data.

Someone here suggests clustering slabs first and then combine them, he then says 'at the end you will have the problem of recombining them and this problem can be solved easily'. The bits designated as 'easy' in explanations are hardly ever easy of course. He links to this paper, but that method does not process all the data in slabs. It rather excludes vectors that are not close to a principal component.

Question

My question has 2 parts:

1. How do I combine the results for the seperate segments? The eigenvectors are different and the cluster numbers are different. The result looks like it worked in the seperate slabs.

2. No distance / affinity between pixels in seperate slabs is taken into account. Can I make 'slabs between slabs'? For those slabs L and A are not symmetric, no clue how to perform the method then. Perhaps I can somehow compare / merge all eigenvectors at the end?

(3. Is there a similar or better method that does not need so much memory. Computation time is also borderline acceptable, easily exploding)

Matlab code example

%% generate data
% get some outer region without data
tempdisk = strel('disk',922/2); tempdisk = double(repmat((1+sqrt(-1)).*tempdisk.Neighborhood,[1 1 15]));
% make some noise
tempnoise = (rand(921,921,15)+sqrt(-1).*rand(921,921,15))./10;
% 'background signal'
tempim1 = double(imresize(mean(imread('cameraman.tif'),3),[921,921])); tempim1 = repmat(tempim1./max(tempim1(:)),[1 1 15]);
% 'target signal'
tempim2 = double(rgb2hsv(imread('fabric.png'))); tempim2 = imresize(tempim2(:,:,2),[921,921]); tempim2 = repmat(tempim2./max(tempim2(:)),[1 1 15]);
sin1 = repmat(permute(sin(2.*pi.*(0:14)./15),[1 3 2]),[921 921 1]);
% combine into data
complexdata = (sin1.*(1.0.*tempim1+0.5.*tempim2.^2).*exp(-sqrt(-1).*2.*pi.*sin1.*(tempim2.^2)).*tempdisk+tempnoise)./1.5;

%this is what the mean data looks like
meannorm = mean(abs(complexdata),3);
meanangle = mean(angle(complexdata),3);
figure; subplot(1,2,1); imshow(meannorm,[]); title('mean norm'); subplot(1,2,2); imshow(meanangle,[]); title('mean angle')

This is what the generated data looks like:

The bright blobs in the right image are what Im after. They have the strongest variation over time as well (and are correlated in time).

Then to set up the clustering:

%% perform spectral clustering in seperate slabs 
% method from http://ai.stanford.edu/~ang/papers/nips01-spectral.pdf
%get all pixel vectors in a single matrix
complexrows = reshape(permute(complexdata, [3,1,2]), [15, 921*921]);
%k means and eigs dont accept complex, so convert to real here?
complexrowsTranspose = [real(complexrows);imag(complexrows)]'; 

%lets say 10000 by 10000 matrices are still ok
npix = 10000;
nslabpix = floor(length(complexrowsTranspose)/npix);
nrestpix = rem(length(complexrowsTranspose), npix);

Perform spectral clustering in slabs that fit into memory:

% spectral clustering 
keig = 50;%how many eigenvectors needed? more is better
affinity_sigma = 1;% i dont understand how to calculate this from the paper
tic
% start with last slab (dynamically preallocate)
for islabpix = (nslabpix+1):-1:1;
    %print progress
    islabpix/nslabpix
    toc
    if islabpix>nslabpix
        pixrange = (1:nrestpix) + ((islabpix-1)*npix);;
    else
        pixrange = (1:npix) + ((islabpix-1)*npix);
    end
    %calculate affinity between all voxels in slab
    Aff = exp( -squareform(pdist(complexrowsTranspose(pixrange,:))).^2/(2*affinity_sigma^2) ); % affinity matrix
    %calculate degree matrix for normalization
    Dsq = sparse(size(Aff,1),size(Aff,2)); %degree matrix
    for idiag=1:size(Aff,1)
        Dsq(idiag,idiag) = sum(Aff(idiag,:))^(1/2);
    end
    %normalize affinity matrix
    Lap = Dsq * Aff * Dsq; %normalize affinity matrix
    %calculate eigenvectors of affinity matrix
    [eigVectors(pixrange,1:keig), eigValues] = eigs(Lap, keig); %eigenvectors of normalized aff mat
    normEigVectors(pixrange,1:keig) = eigVectors(pixrange,1:keig)./repmat(sqrt(sum(abs(eigVectors(pixrange,1:keig)).^2,2)), [1 keig]); %normalize rows of eigen vectors, normr only works on real numbers
    % perform k means clustering on weights for eigenvectors
    [idx,C,sumd,D] = kmeans([real(normEigVectors(pixrange,1:keig)),imag(normEigVectors(pixrange,1:keig))], 5); %k means on normalized eigenvecotrs

    idxval(pixrange) = idx;
end
%reshape into image
idxim = reshape(idxval, [921, 921]);
figure; imshow(idxim,[])
toc

The resulting clustering:

The result looks like the method is working to some degree within each slab; the goal was to cluster all blobs with slightly higher norm and stronger angle variation (high saturation blobs from tempim2), which seem recognizable in the result. Now its mostly the seperate slabs that are the issue and that there are no cross-slab clusters. This took my computer about 15 minutes. I reduced the number of eigenvalues and the image size for this example so it runs in an acceptable amount of time. I think that illustrates part of my problem.

please provide a link in the question to `sparse_adj_matrix`. if you are not getting the results you want, consider using different ways to compute the pair-wise affinities, in a way that will capture what you consider as "similar" (similar phase, etc.). Have you considered posting the second part of your question as an answer? — Shai, May 24 '18 at 05:37
I added a link and moved the update to an answer. I'm still trying some different inputs to calculate the distance / affinity, just very slow because eigs runs for so long. My first try didnt even finish overnight. — Leo, May 24 '18 at 10:16

Shai · Accepted Answer · 2018-05-24T06:34:27.723

I don't really have an answer for you, but I think these pointers should help you find an answer:

You claim to have memory problems. Are you sure your affinity matrix is sparse? It seems like only the diagonal degree matrix is sparse in your code. Usually when running spectral clustering on pixels/voxels one designs the affinity matrix to be very sparse (8 connect or 26 connect).
You describe your clusters as "they are small". Spectral clustering has known issues with clusters in very different scales. Are you sure you are getting satisfactory results?
How do you compute the affinity (similarity) between neighboring voxels? Can you measure dissimilarity as well? That is, can you say for some voxels that they should not belong to the same cluster? If so, have you considered using correlation clustering? This method is more robust to different cluster scales and can automatically detect the number of clusters.
Have you considered using multiscale/multigrid methods to coarsen your data instead of brutally slicing it into "slabs"?
Have you looked at spectralNet? If I am not mistaken, this method should enable you to "learn" the spectral clustering on part of the points and then use the net to "extrapolate" the clustering to new points.

Upadate:
In light of Leo's comment, I would say that when it comes to spectral clustering of very large data, brutally slicing the data into "slabs" and then trying to "stitch" the results together might not be the best coarse of action (not that I think it is not possible). A better way to approach the problem is by significantly sparsifying the affinity matrix: compute pair-wise affinities for each point only to its neighbors, resulting with affinity matrix that is mostly sparse. This way one can process all the points at once without the need to "slice" and "stitch".

As for the difference between spectral clustering and correlation clustering:
Why spectral clustering is able to cluster all points even when the input affinity matrix is so sparse? how can it tell that point a and a far away point c should belong in the same cluster even when no affinity was computed between them?
The simple answer is transitivity of affinities: if a is similar to b and b is similar to c then a and c should be clustered together.
Where's the catch? In spectral clustering all entries in the affinity matrix are non-negative, which means that unless there is absolutely no path connecting a and c (slim chance) there is some "transitive affinity" suggesting a and c should belong to the same cluster. Therefore, if look at the math of spectral clustering you'll notice that the "trivial solution", i.e., placing all points in the same cluster, provides a global optimum to the problem. One must artificially force the solution to have k clusters to avoid the trivial solution.
What can be done? If you only consider positive affinities the value 0 is ambiguous: it means either "I didn't bother to compute the affinities between these points", but it can also mean "I think these two points should not be in the same cluster". To overcome this ambiguity we can introduce negative affinities this way if A(i, j) > 0 means point i and point j should be in the same cluster with certainty A(i, j), while if A(i, j) < 0 means i and j should not be in the same cluster (with certainty |A(i, j)|). Introducing negative affinities breaks the "transitivity chains" that may link far away points, no it is no longer trivial to place all points in the same cluster.
How to take advantage of negative affinities? When your affinity matrix has both positive (attraction) and negative (repulsion) values, you can cluster the points using correlation clustering which basically tries to maximize the affinities/attraction between points within each cluster and simultaneously maximize the repulsion between points in different clusters. A nice property of correlation clustering is that it "automatically" discover the underlying number of clusters, see sec. 2 of this paper.

Great suggestions, thank you. For your questions: 1)I made Dsq sparse to save memory, A is not sparse. 2)I kind of cheat by excluding wildy different valued voxels beforehand, but it might still be an issue. 3)In the declaration of Aff I compute dissimilarity first (pdist) then convert it to similarity. I stumbled upon spectral clustering because I was correlating all my voxels to all my voxels, but didnt know how to cluster the results. Correlation clustering sounds perfect. 4) Good suggestion, guess this is too quick and dirty. 5) Will definitely check out spectralNet! thanks. — Leo, May 23 '18 at 07:32
@Leo one of the many advantages of spectral clustering (and correlation clustering) is the fact that you do not need to explicitly compute affinity/distances between ALL points. It is usually enough to compute distances/affinities to only a limited number of neighboring points and the "transitivity" property of affinity will propagate to further points. Maybe you can restrict the computation of `Aff` to be very sparse and then you can process all points together. — Shai, May 23 '18 at 07:53
I could make `Aff` sparse by only computing is for a subset of voxels. I think that is what you mean. But then after calculating the eigenvectors of the resulting `L`, Im not sure what is a good way to get values for the voxels that I did not put in. Perhaps perform a regression of all my voxels that I didnt put in with the resulting eigenvectors as regressors / in the design matrix? Sounds like it would work. — Leo, May 23 '18 at 08:04
@Leo all voxels should participate in `L`, but you do not need to compute ALL affinities for each voxel, you can (and should) compute only affinities to nearby voxels (making most of entries in each row zeros). — Shai, May 23 '18 at 08:38
Thanks for pointing that out, I did not understand that. Really helpful. Im working on it now. Immediately ran into memory issues when trying to select 'nearby' voxels with `[XX, YY] = ndgrid(1:size(meanm,1),1:size(meanm,2)); DXY = pdist([XX(:),YY(:)]);`. [Your function](http://www.wisdom.weizmann.ac.il/~bagon/matlab_code/sparse_adj_matrix.m) for a sparse adjacency matrix is a life saver! — Leo, May 23 '18 at 09:37
@Leo glad you found that. I was thinking it would only confuses you. — Shai, May 23 '18 at 10:33
Reading back my question, and especially the title 'how to combine slabs of seperate spectral clustering runs', you may have answered it already. Is the answer 'you cant' or is the answer 'you dont have to' combine such slabs (if you calculate affinity for fewer combinations of voxels)? If you write a little conclusion at the end of your answer that mentions 'you cant' or 'you dont have to', then you pretty much answered the question and I'll click accept. Maybe also explain *why* only affinities between nearby pixels are needed and what the difference with correlation clustering is? — Leo, May 23 '18 at 16:14

Leo · Answer 2 · 2018-05-25T06:57:17.120

After Shai's answer I switched to only calculating the affinity between pixels that are adjacent (within radius of 4 pixels) and using sparse matrices. I then get equal clustering for the entire image. To make a sparse adjacency matrix I use Shai's function sparse_adj_matrix, otherwise memory is filled just by the adjacency matrix alone.

tic
complexrowsTranspose = [real(complexrows);imag(complexrows)]';
indexnonzero = find(mean(tempdisk,3) > 0);
idxval = zeros(size(complexrowsTranspose, 1),1);

[irow jcol] = sparse_adj_matrix([size(meannorm,1), size(meannorm,2)], 4, 2);
keep = ismember(irow, indexnonzero);
irow(~keep) = [];
jcol(~keep) = [];
clear keep

sigma = 1;
Avect = exp(-sum((complexrowsTranspose(irow,:)-complexrowsTranspose(jcol,:)).^2,2)/(2*sigma^2));
iAval = find([Avect>0].*[Avect<Inf]);
Aff = sparse(irow(iAval),jcol(iAval),Avect(iAval),length(complexrowsTranspose),length(complexrowsTranspose));
Dvector = sum(Aff,2).^(1/2);
Dindex = find(Dvector);
D = sparse(Dindex, Dindex, Dvector(Dindex),size(Aff,1),size(Aff,2));
L = D * Aff * D;

keig = 25;
[Vect, Val] = eigs(L, keig);
normVect = Vect./repmat(sqrt(sum(abs(Vect).^2,2)), [1 keig]);
[kidx,kC,ksumd,kD] = kmeans(normVect, 5);

kmeansim = reshape(kidx, [921, 921]);
figure; imshow(kmeansim,[])
toc

This is what the resulting clusters look like:

Looks much better. However the clusters that I`m interested in dont show up (the blobs with high values in the 'angle' image, inside the dark coat of the photographer). Especially the pixels with similar mean norm values are clustered, not with similar changes over the short series nor with similar angles (of the complex values).

I'll try tweaking the input values and the radius of adjacency to calculate affinities for.

Update

I tried putting in just the angles, the entire complex values (and adapting code to work for complex values), changing the radius within which affinities were calculated, putting in 1-correlation instead of distance, but I did not get the small bright angle blobs in the photographers coat to be clustered in a group.

I then downloaded this code and tried it like below:

complexrowsTranspose = complexrows';
[icol irow] = sparse_adj_matrix([921, 921], 1, Inf);

complexrowsTminmean = complexrowsTranspose -repmat(mean(complexrowsTranspose , 2), [1, 15]);
complexrowsTstd = sqrt( std(real(complexrowsTranspose), [], 2).^2+std(imag(complexrowsT), [], 2).^2 );
complexrowsTcorr = sum(real(complexrowsTminmean(icol).*complexrowsTminmean(irow)), 2)./complexrowsTstd(irow)./complexrowsTstd(icol)./(15-1);

Asparse = sparse(irow, icol, 1-complexrowsTcorr, 921*921, 921*921);
Asparse(isnan(Asparse)) = 0;

K = AL_ICM(Asparse);

But I dont get it to move beyond the first iteration. The way I calculate correlation for these complex vectors might not satisfy the requirements of the function.

if you are using ICM, you should work [multiscale](https://github.com/shaibagon/discrete_multiscale) — Shai, May 25 '18 at 08:44

How to combine split runs of spectral clustering for a huge affinity matrix

2 Answers2