0

I am trying to run the corr.test function in a for loop between a range of columns in a data frame against the rest of the columns in the same data frame. However, I have a lot of NA values throughout this data frame. I don't want to omit the rows altogether and lose the rest of the data in the rows and I also don't want to set NA = 0 because it will interfere with the rest of the data (scores that are either -1, 1, or 0). Every time I try to run the corr.test function, R keeps saying that x or y are not numeric vectors.

Is there any way to get around this?

The first column (rownames) of my data frame is a list of sample IDs, columns 2-50 are scores, and 51 onward are scores of a different type. What I've been doing so far is using for loop to run corr.test between each range of columns like this example:

cor.test(data[1:50], data[51:200])

This works fine in the for loop if I convert NA values to 0 but is there any way to avoid doing that?

Nzk211
  • 11
  • 2
  • How do you want to get around it? What makes sense for the type of inference you want to do? if you aren't sure how to analyze your data, then you should first get help at [stats.se] where statistical questions are on topic. Stack Overflow is for specific programming questions and it doesn't sound like you know what you want to do in this case. You can't do calculations with missing values. Maybe you want to impute missing values? But then you need to decide what assumptions you want to make for your imputation model. Dropping missing values is certainly the easiest thing to do. – MrFlick Aug 16 '21 at 04:08
  • Thank you for the comment about Cross Validated! I wanted to omit the missing values and exclude them from the correlation matrix but I have no way of doing that without setting the values to zero and in doing so incorporating them into the matrix as potentially incorrectly significant values. Most rows of my df have at least one NA value so omitting rows altogether would not be ideal. I wanted to know if there was any other way to drop the missing values from cor.test or exclude them from the calculations – Nzk211 Aug 16 '21 at 04:25
  • `cor.test()` has a `na.action` method when using the formula interface. Looks like the default is to drop missing values. See https://stats.stackexchange.com/questions/174198/cor-test-in-r-and-na-values/174204. It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. The problem could be with `data[1:50]`. That's not the right way to subset a data.frame Is that supposed to be 50 rows or columns? – MrFlick Aug 16 '21 at 04:31
  • Thanks! I' will take a look. I tried using na.action but I suppose it may be the way I am subsetting the data frame. It is supposed to be the first 50 columns – Nzk211 Aug 16 '21 at 04:35

0 Answers0