0

I have a csv file containings a matrix:

version getSize() length() ... power
0         23000    23421        0.8
0           ..      ..           ..
1           ..      ..           ..
1           ..      ..           ..

I want to aggregate by similar versions applying the mean function to the columns. The columns are too many to write them. I also want to calculate the correlation matrix and binding the power column at the sides of the plot. My code is this:

matrix <- read.csv("/home/francesco/University/UoA/matrix.csv", header=TRUE, sep=",", fileEncoding="windows-1252")
power <- matrix[,"power"]
binded <- cbind(matrix,power)
aggregated <- aggregate(. ~ version, data = binded, mean)
corMatrix <- cor(aggregated, method="spearman")
library(lattice)
levelplot(corMatrix)

The plot is pretty confused and I get this warning:

Warning message:
In cor(aggregated, method = "spearman") : standard deviation is zero

A short extract of matrix.csv is:

version,native_drawBitmap,nPrepareDirty,nDrawDisplayList,startGC,power
00083,8,88,308,12,0.8967960131052847
00083,0,176,404,1,0.867644513259528
00084,8,88,307,10,0.8980234065469381
00084,0,181,408,1,0.871799879659241

Someone knows what I'm doing wrong?

Thanks in advance

  • A few pointers, don't use reserved words such as matrix to name your objects/variables. Don't use `cbind` unless you're sure all your variables are numeric. As for the warning, are you getting NA's as the result of your cor call? It's telling you have variables with zero variance. If you paste some of your data then we could help better. – infominer May 20 '14 at 20:38
  • When asking for help, it's important to make a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). You don't have to (and shouldn't) post all of your data. Post as little as possible to re-create the exact error you are getting. See the linked article for more information on creating good examples. – MrFlick May 20 '14 at 20:46
  • The matrix is huge and I can't paste even a single line. I've posted few cutted lines of this matrix. Hope this can be usefull. – user3657495 May 20 '14 at 21:00
  • @infominer: I get no output for that line, then I think the result is good. – user3657495 May 20 '14 at 21:04

1 Answers1

0

Well, with your sample data, the native_drawBitmap column becomes all 4's. Since this has no variance, you can't calculate a pair-wise correlation with any other variables and you get the error. If you leave out this column, it will work. Here is an example.

#sample data in friendly copy/paste-able format
mm<-data.frame(
    version = c(83, 83, 84, 84), 
    native_drawBitmap = c(8, 0, 8, 0),
    nPrepareDirty = c(88, 176, 88, 181), 
    nDrawDisplayList = c(308, 404, 307, 408), 
    startGC = c(12, 1, 10, 1), 
    power = c(0.896796013105285, 0.867644513259528, 
        0.898023406546938, 0.871799879659241)
)

# these are not needed and don't make sence. Why are you
#trying to re-add the column from mm back onto mm?
# power <- mm[,"power"]
# binded <- cbind(mm,power)
aggregated <- aggregate(. ~ version, data = mm, mean)

#error
corMatrix <- cor(aggregated, method="spearman")
#no error
corMatrix <- cor(aggregated[,-2], method="spearman")

You may have other columns in your data that have no variability after aggregation. Be sure to find these and remove them.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • I have 17000 columns. It's not a practicable way I guess. Do you know how to discard columns with no variabilty automatically? Thanks – user3657495 May 20 '14 at 21:20
  • You can look for columns with very small ranges `badcols<-lapply(aggregated, function(x) diff(range(x))<.0001)`. Testing for exactly 0 might lead to problems with precision. Then do `cor(aggregated[,!badcols])` – MrFlick May 20 '14 at 21:35
  • You could also calc. the variance and exclude the ones with zero variance `nearZeroVarcols <- which(sapply(aggregated, var) == 0)` – infominer May 20 '14 at 21:42
  • @MrFlick: These commands give me this error: Error in !badcols : non valid argument type – user3657495 May 20 '14 at 21:50
  • @user3657495 sorry, that was supposed to be `sapply`, not `lapply`. – MrFlick May 20 '14 at 21:55