0

In SPSS you can enter the data as 0 and 1, then points out that the data is nominal. Then you can calculate whatever you want, like Pearson or Spearman correlation. However in R, when you enter the data you have to specify that this data is a factor even it's numeric you have to specify it's a factor, then it will be treated as a string. Now when I use cor(), I don't work because it needs numeric input.

How do you overcome this?

An example is given below:

data(Titanic)
Titanic <- data.frame(Titanic) 
cor(Titanic$Sex, Titanic$Freq)
Omar113
  • 210
  • 1
  • 7
  • 2
    Could you please provide a reproducible example of what you mean. – Dimitris Rizopoulos Sep 28 '18 at 09:18
  • You can just use `as.integer()` around your factorial variable so that it can be used in the calculation. – hannes101 Sep 28 '18 at 09:23
  • @DimitrisRizopoulos I have data called "dat" , it has 2 columns; gender and age. I want to calculate Pearson correlation for this data. Gender data is coded M and F. I want use cor to get a p-value – Omar113 Sep 28 '18 at 09:23
  • @hannes101 what if it's actually inputed in text, should I recode all the data again into numbers?! – Omar113 Sep 28 '18 at 09:25
  • 1
    Please show us some of the data, you can use `dput()` on a smaller subsample of 10 observations and show it to us. – hannes101 Sep 28 '18 at 09:26
  • Your question is unclear, please read and edit your question according to: [How to make a great R reproducible example?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – nghauran Sep 28 '18 at 09:27
  • What do you want to do? Convert a factor variable to numeric? Numeric to factor? – nghauran Sep 28 '18 at 09:29
  • I have updated the post with simple example @ANG – Omar113 Sep 28 '18 at 09:36
  • `cor()` takes two numeric vectors. Here `Titanic$Sex` is not numeric – nghauran Sep 28 '18 at 09:44
  • In spss similar function works fine as the variable Sex will be stored as numeric with a type nominal – Omar113 Sep 28 '18 at 09:46
  • 4
    Brute force transforming categorical variable into numeric and calculating *correlation* is wrong. I would suggest using regression, for example: `lm(Freq ~ Sex, Titanic)` – pogibas Sep 28 '18 at 10:01

1 Answers1

2

How do you overcome this?

Two ways:

  1. Feed the data to cor() how the function expects you to:
data(Titanic)
Titanic <- data.frame(Titanic) 
cor(Titanic$Sex, Titanic$Freq) # Bad, Titanic$Sex is a factor, not numeric
# Error in cor(Titanic$Sex, Titanic$Freq) : 'x' must be numeric
cor(as.numeric(Titanic$Sex), Titanic$Freq) # Good, cor() expects numeric
# [1] -0.294397

If you don't want to have to type out as.numeric, you can just use c():

cor(c(Titanic$Sex), Titanic$Freq)
# [1] -0.294397
  1. If you don't want to have to do that all the time, you can just make your own cor() to do it for you:
cor <- function(x, y, ...) {
    if ( !is.numeric(x) ) {
        message("Converting x to numeric.")
        x <- as.numeric(x)
    }
    if ( !is.numeric(y) ) {
        message("Converting y to numeric.")
        y <- as.numeric(y)
    }
    return(stats::cor(x, y, ...))
}

data(Titanic)
Titanic <- data.frame(Titanic) 
cor(Titanic$Sex, Titanic$Freq)

# Converting x to numeric.
# [1] -0.294397

Why won't R do things like SPSS?

  1. It's different software. You may have built up certain assumptions or expectations working with one particular piece of software for some time, but you should lose the expectation that other software will, or should, work the same way.
  2. R's way may be more appropriate. You can see some discussion in PoGibas's comment, and on Cross Validated on here.
duckmayr
  • 16,303
  • 3
  • 35
  • 53
  • 1
    Typical correlation metrics (a la Pearson, Spearman, Kendall) don't make sense for categorical data. From a statistical point of view, calculating a Pearson's product moment correlation coefficient between a categorical variable "turned numeric" and a continuous variable therefore makes no sense, and I would strongly advise against such a practice. There exist quite a few interesting posts on Cross Validated that discuss alternative approaches towards establishing a relationship between a categorical and continuous variable. – Maurits Evers Sep 28 '18 at 10:34
  • @MauritsEvers I agree, which is why I have a link to one such discussion from Cross Validated in point two on my second heading, as well as mentioned PoGibas comment to the question on the same issue – duckmayr Sep 28 '18 at 11:07