0

I set up a correlation matrix in Excel. I then read it into R and do some adjustments to plot the Community Detection in the end. This works just fine.

To better adjust the plot, I use the function tkplot(). I get the error message:

Error in col[idx] <- substr(col[idx], 1, 7) : NAs are not allowed in subscripted assignment

I have a matrix of 49x49. I found out that the problem is created by a few numbers in this matrix. If I set them to 1, tkplot() works. Upon restarting R, these numbers are not the same anymore. I have to manually find them in the Excel by changing some to 1 and then check if the code works in R.

I am a novice in R - please apologize if someone has already answered this question in another thread.

My code is the following:

# Read Excel
correlationmatrix_Sustainability <- read_excel("Official Research.xlsx", 
                                     sheet = "Corr_Sustainability")

# Delete first column and rename rows

correlationmatrix_Sustainability <- correlationmatrix_Sustainability[,-1]

column_names_Sustainability <- colnames(correlationmatrix_Sustainability)
rownames(correlationmatrix_Sustainability) <- column_names_Sustainability

# Work the magic

distancematrix_Sustainability <- cor2dist(correlationmatrix_Sustainability)


# If instead, you do not want variables with negative correlation to be connected, 
# just get rid of the absolute value above. This should be much less connected

DM2_Sustainability <- as.matrix(distancematrix_Sustainability)

## Zero out connections where there is low correlation
DM2_Sustainability[correlationmatrix_Sustainability < 0] = 0

# Correlation matrix as long list
matrix(DM2_Sustainability, dimnames=list(t(outer(colnames(DM2_Sustainability), rownames(DM2_Sustainability), FUN=paste)), NULL))

# Number of companies per node
nodes_Sustainability <- read_excel("Official Research.xlsx", 
                         sheet = "Nodes_Sustainability", col_names = TRUE)
# Edges: Correlation matrix as long list

correlationmatrix_Sustainability[correlationmatrix_Sustainability < 0] = 0

correlation_Sustainability <- as.data.frame(correlationmatrix_Sustainability)
rownames(correlation_Sustainability) <- column_names_Sustainability
correlation_Sustainability$rownames <- rownames(correlation_Sustainability)
edges_Sustainability <- melt(correlation_Sustainability,id.vars = "rownames")
colnames(edges_Sustainability) <- c("1","2","weights")

edges_Sustainability <- edges_Sustainability[edges_Sustainability$weights>0, ]
edges_Sustainability <- edges_Sustainability[complete.cases(edges_Sustainability$weights),]

graph_Sustainability <- graph_from_data_frame(d=edges_Sustainability,vertices = nodes_Sustainability,directed = FALSE)


# Louvain Method for community detection
clusterlouvain_Sustainability <- cluster_louvain(graph_Sustainability,weights = edges_Sustainability$weights)

# Change width of arrows based on correlation weight
scaling_factor <- 25     # Define a scaling factor (adjust this according to your preference)
E(graph_Sustainability)$width <- E(graph_Sustainability)$weights*scaling_factor

# Change size of nodes based on company count
V(graph_Sustainability)$size <- V(graph_Sustainability)$Count

# Plot  graph
plot(graph_Sustainability, layout=layout_nicely, vertex.color=rainbow(5, alpha=0.6)
 [clusterlouvain_Sustainability$membership],edge.width=E(graph_Sustainability)$width, 
 red=100)          # Use res to increase resolution


# Get rid of unnecessary edges
cut.off <- 0.4 # Get rid of all edges with correlation < 0.1

graph_Sustainability.reduced <- delete_edges(graph_Sustainability, E(graph_Sustainability)[weights<cut.off])
plot(graph_Sustainability.reduced, layout=layout_nicely, vertex.color=rainbow(5, alpha=0.6)
 [clusterlouvain_Sustainability$membership],edge.width=E(graph_Sustainability)$width)

# Change placement of nodes myself
# Use Fruchterman-Reingold or Kamada-Kawai
tkplot(graph_Sustainability, layout=layout_nicely, vertex.color=rainbow(5, alpha=0.6)  # Not reduced
   [clusterlouvain_Sustainability$membership],edge.width=E(graph_Sustainability)$width, 
   red=100)

tkplot(graph_Sustainability.reduced, layout=layout_nicely, vertex.color=rainbow(5, alpha=0.6) # Reduced
   [clusterlouvain_Sustainability$membership],edge.width=E(graph_Sustainability)$width, 
   red=100)`

I changed all numbers of the correlation matrix to 1 and also used different functions to read the Excel. Neither worked.

  • Can you construct a minimal reproducible example? https://stackoverflow.com/help/minimal-reproducible-example – Szabolcs Jul 27 '23 at 06:44
  • Thank you for your answer. In the meantime I was able to solve the problem, which laid in the coloring of the graph. I set the number to 5. However, if there are more or less clusters found in the data, tkplot cannot compute the graph. # Get the number of clusters num_clusters <- length(unique(clusterlouvain_Tagliaro$membership)) ..., vertex.color=rainbow(num_clusters, alpha=0.6)... – Sebastian Krebs Jul 28 '23 at 07:05

0 Answers0