3

I have been running Louvain community detection in R using igraph, with thanks to this answer for my previous query. However, I found that the cluster_louvain method seemed to do something strange with assigning group membership, which I think was due to an error in how I imported my data. Whilst I think I resolved this I would like to understand what the problem was.

I ran louvain clustering on a 400x400 correlation matrix (i.e. correlation scores for 400 individuals). When I initially imported my data, my correlation matrix had the same individuals’ ID numbers (i.e. vertex numbers) for both the row and column headings, as below:

    1     2     3     4   ... 400 
1   0     0.8   0.7   0.1 
2   0.8   0     0.6   0.3
3   0.7   0.6   0     0.9
4   0.1   0.3   0.9   0                    
...
400                          

This correlation matrix was saved in a "Correlations.csv" file, which I imported using read.csv. I then used the below code to convert it to a distance matrix, remove correlations below a certain threshold, turn it into an adjacency matrix for igraph, and run cluster_louvain: (This code is also provided in the answer here).

correlationmatrix <- read.csv("Correlations.csv", header = TRUE, 
row.name = 1, check.names = FALSE)

distancematrix <- cor2dist(correlationmatrix)
DM2<- as.matrix(distancematrix)
DM2[correlationmatrix < 0.33] = 0

G2 <- graph.adjacency(DM2, mode = "undirected", weighted = TRUE, diag = TRUE)
clusterlouvain <- cluster_louvain(G2)

sizes(clusterlouvain)
Community sizes
1  2
200 200

I then wanted to get the cluster number beside each ID number, to know which individual belonged to each community. In list of vertex IDs, the membership beside them was listed as ‘1 2 1 2 1 2 1 2’, which obviously was not right (as we would not expect every alternate individual in the dataset to be assigned to a different community):

IDs_cluster <- cbind(V(G2)$name, clusterlouvain$membership)
IDs_cluster

ID  Membership
1   1
2   2 
3   1
4   2
5   1
6   2
…
400 2

From looking at other datasets I realised the problem might have been because the row headings in my correlation matrix were numerical. So I changed the correlation matrix so that the row headings were still the ID numbers, but the column headings were `V1-V400':

    V1    V2    V3    V4   ... V400 
1   0     0.8   0.7   0.1 
2   0.8   0     0.6   0.3
3   0.7   0.6   0     0.9
4   0.1   0.3   0.9   0                    
...
40

I imported this as a .csv file and re-ran ‘cluster_louvain’, as below:

correlationmatrix_V <- read.csv("Correlations_withV.csv", header = TRUE,
row.name = 1, check.names = FALSE)

distancematrix_V <- cor2dist(correlationmatrix_V)
DM2_V <- as.matrix(distancematrix_V)
DM2_V[correlationmatrix_V < 0.33] = 0

G2_V <- graph.adjacency(DM2_V, mode = "undirected", weighted = TRUE, diag = TRUE)
clusterlouvain_V <- cluster_louvain(G2_V)

Now when I reran cluster_louvain, it generated a more sensible result of three clusters, with group membership to each cluster looking more like what we would expect:

sizes(clusterlouvain_V)
Community sizes
1     2    3
168 52 180

IDs_cluster <- cbind(V(G2_V)$name, clusterlouvain_V$membership)
View(IDs_cluster)
ID  Membership
1   1
2   1 
3   3
4   2
5   2
6   2
…
400 1

My question is: May it be possible to clarify what happened when using the same row and column headings, that meant group membership was assigned to alternate individuals (i.e. '1 2 1 2' down the ID list, as in the first example), but was resolved when changing the column headings to a non-numerical format (as in the second example)?

This may be a simple mistake in that when importing the .csv of the correlation matrix using ‘read.csv’ I did not use the correct settings, given my column headings were also numerical.

However, would like to understand why this meant ‘cluster_louvain’ assigned group membership in the way it did. I am posting this in case it may be useful if anyone makes the same mistake I did above. Any insights would be welcome, and thank you for any advice!

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
A.Robin
  • 61
  • 1
  • 1
  • 3
  • If you're concerned about the column headings, try `correlationmatrix <- read.csv("Correlations.csv", header = FALSE, skip = 1)`. That will automatically assign column names V1, V2, etc. – camille Apr 16 '18 at 14:10
  • That's very helpful! Thank you for the suggestion. – A.Robin Apr 17 '18 at 10:33
  • Have you ever found a solution to this? It might be worth notifying the igraph people, it looks a lot like a bug in the code. – mic Jan 29 '20 at 21:15

0 Answers0