2

I am running a cor.test on two columns within a file/table.

tmp <- read.table(files_to_test[i], header=TRUE, sep="\t")
## Obtain Columns To Compare ##
colA <-tmp[compareA]
colB <-tmp[compareB]
# sctr = 'spearman cor.test result'
sctr <- cor.test(colA, colB, alternative="two.sided", method="spearman")

But I am getting this confounding error...

Error in cor.test.default(colA, colB, alternative = "two.sided", method = "spearman") : 
'x' must be a numeric vector

the values in the columns ARE numbers but

is.numeric(colA) = FALSE 
class (colA) = data.frame

What have I missed?

mccurcio
  • 1,294
  • 5
  • 25
  • 44
  • 2
    what are the results of `str(colA)` and `str(colB)`. I'm guessing the data were read in as factors or character data, probably because there is an errant character in the data that you're reading in from. – Chase Oct 04 '11 at 18:12
  • @Chase: I upvoted your comment but then realized it's not the issue (see both answers below). In hindsight I think it's a little misleading, but I can't remove my upvote ... – Ben Bolker Oct 04 '11 at 19:03
  • @Ben and @Chase: The advice to give `str` results is good though. (I gave the second upvote). – Aaron left Stack Overflow Oct 04 '11 at 20:25

2 Answers2

10

Put a comma before your selector. When you select in a data.frame object with a single indexing variable without a comma it extracts a column as a list element retaining type. Therefore, it's still a data.frame. But, data.frame objects allow you to select using matrix style notation and then you would get a simple vector. So just change

colA <-tmp[compareA]
colB <-tmp[compareB]

to

colA <-tmp[,compareA]
colB <-tmp[,compareB]

I think this is more keeping with the spirit of the data.frame type than double brace ([[) selectors, which will do something similar but in the spirit of the underlying list type. They also are unrelated to individual item and row selectors. So, in code that's doing multiple kinds of things with the data.frame the double brace selectors stand out as a bit of an odd duck.

John
  • 23,360
  • 7
  • 57
  • 83
  • this works as well as my solution above. I wonder if there are corner cases where they behave differently ... – Ben Bolker Oct 04 '11 at 18:40
4

Try tmp[[compareA]] and tmp[[compareB]] instead of single brackets. You wanted to extract numeric vectors, what you did instead was to extract single-column data frames. Compare the following:

> z <- data.frame(a=1:5,b=1:5)
> str(z["a"])
'data.frame':   5 obs. of  1 variable:
 $ a: int  1 2 3 4 5
> is.numeric(z["a"])
[1] FALSE
> str(z[["a"]])
 int [1:5] 1 2 3 4 5
> is.numeric(z[["a"]])
[1] TRUE

Try these out with cor.test:

Single brackets: error as above.

> cor.test(z["a"],z["b"])
Error in cor.test.default(z["a"], z["b"]) : 'x' must be a numeric vector

Double brackets: works.

> cor.test(z[["a"]],z[["b"]])

    Pearson's product-moment correlation

data:  z[["a"]] and z[["b"]] 
[snip snip snip]

As @Aaron points out below, cor will handle single-column data frames fine, by converting them to matrices -- but cor.test doesn't. (This could be brought up on r-devel@r-project.org , or ?? submitted to the R bug tracker as a a wish list item ...)

See also: Numeric Column in data.frame returning "num" with str() but not is.numeric() , What's the biggest R-gotcha you've run across? (maybe others)

Community
  • 1
  • 1
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453