0

I like the cor() function but would like to know how to get a count of the number of pairs in a sparse matrix giving rise to the correlation values. Many thanks.

Here's a very simplified version of the kind of data table I have. example data table picture

I've also added a cut down version of the actual file I'm working on here

What I'd like is to find a way to give me something like the following matrix (below is from the pic rather than the file in the hyperlink): example desired output

This shows how many pairs of values there are in common between columnA and columnB so I can see which columns are worth comparing in a correlation. Does that make sense?

dput(mat) structure(list(A = c(9.4, 9.4, 4.7, 1.2, NA, 0.6, 7.712, 0.2, NA, NA, 3.13, NA, 1.56, 6.25, NA, NA, 0.9471, NA, 1.56, 1.2, 0.78, NA, NA, NA, NA, NA, NA), B = c(4.7, 12.5, 2.3, 2.3, 9.4, 0.78, 9.45, 0.6, NA, NA, 3.13, NA, 2.3, 6.25, NA, NA, 10.72, NA, 2.3, 12.5, 6.25, NA, NA, NA, NA, NA, NA), C = c(4.7, 9.4, 4.7, 0.6, NA, 0.6, 10.84, 0.2, 3.67, 2.345, 3.13, 3.288, 1.56, 9.4, 11.21, 0.6, 2.256, 50, 1.56, 3.13, 0.78, 18.7, 0.66, 1.2, 6.26, 6.258, 50)), .Names = c("A", "B", "C"), class = "data.frame", row.names = c(NA, -27L))

Canute201
  • 33
  • 2
  • 4
    Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610). This will make it much easier for others to help you. – Jaap Feb 23 '18 at 16:08

1 Answers1

1
outdf <- c()
for (x in colnames(mat)) {
  for (y in colnames(mat)) {
    subset <- mat[,c(x, y)]
    number_complete <- nrow(subset[complete.cases(subset),])
    row <- c(x, y, number_complete)
    outdf <- rbind(outdf, row)
  }
}
outdf <- data.frame(outdf)
dcast(outdf, X1 ~ X2)
# X1  A  B  C
# 1  A 14 14 14
# 2  B 14 15 14
# 3  C 14 14 26
AidanGawronski
  • 2,055
  • 1
  • 14
  • 24
  • That looks awesome Aidan, I understand some of it, but can't get it to work on my example (maybe presence of NA is a problem?) - sorry I didn't post a link to my actual data until now. I'm doing mat <- my_data_frame & then trying to execute your code. I should add that I'm using RStudio, but I assume that shouldn't make a difference – Canute201 Feb 23 '18 at 17:53
  • Japp mentions a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610) ... this means we can actually run the code you are having a problem with. Please provide that. – AidanGawronski Feb 23 '18 at 18:06
  • I don't have any code, just the data - I've now edited the problem with a link to the data file I'm using. Many thanks – Canute201 Feb 23 '18 at 18:10
  • 1
    Please use dput(your_dataframe) so that we know exactly what data you are working with (and the type). (For example you must have code that reads in the csv, and converts it to a sparse matrix ... we need that code, or the end result). – AidanGawronski Feb 23 '18 at 18:17
  • Aidan, that's perfect. Very many thanks for all your help & apologies for my beginner errors on here. – Canute201 Feb 23 '18 at 18:45
  • No problem. If it was helpful, you can "accept" this answer by clicking the check mark next to it. – AidanGawronski Feb 23 '18 at 18:57