2

I want to find Spearman's rank correlation rho value between these variables.

V1  V2  V3  V4
A   SUV Yes Good
A   SUV No  Good
B   SUV No  Good
B   SUV Yes Satisfactory
C   car Yes Excellent
C   SUV No  Poor
D   SUV Yes Poor
D   van Yes Satisfactory
E   car No  Excellent


corr <- cor.test(x=df$V2, y=df$V3, method = "spearman")
corr

On passing the code , I received the following error (Error 1)

Error in cor.test.default(x = df$V2, y = df$V3, method = "spearman") : 
  'x' must be a numeric vector

What I tried?

Based on this discussion in stack overflow: How to convert a data frame column to numeric type?

transform(df, V2 = as.numeric(V2))

However, on passing the above code I receive following error (Error 2) and error 1 message keeps appearing even after transformation.

Warning message:
In eval(substitute(list(...)), `_data`, parent.frame()) :
  NAs introduced by coercion
vp_050
  • 583
  • 2
  • 4
  • 16

1 Answers1

2

According to ?cor.test,

x, y - numeric vectors of data values. x and y must have the same length.

One option is to convert to factor and coerce to integer

cor.test(x=as.integer(factor(df$V2)), y=as.integer(factor(df$V3)), method = "spearman")

    Spearman's rank correlation rho

data:  as.integer(factor(df$V2)) and as.integer(factor(df$V3))
S = 95.158, p-value = 0.593
alternative hypothesis: true rho is not equal to 0
sample estimates:
      rho 
0.2070197 

The code gives warning and return NA because it is trying to convert a character class column directly to numeric. Instead it would be to factor -> numeric/integer

transform(df, V2 = as.numeric(factor(V2)))

data

df <- structure(list(V1 = c("A", "A", "B", "B", "C", "C", "D", "D", 
"E"), V2 = c("SUV", "SUV", "SUV", "SUV", "car", "SUV", "SUV", 
"van", "car"), V3 = c("Yes", "No", "No", "Yes", "Yes", "No", 
"Yes", "Yes", "No"), V4 = c("Good", "Good", "Good", "Satisfactory", 
"Excellent", "Poor", "Poor", "Satisfactory", "Excellent")), 
class = "data.frame", row.names = c(NA, 
-9L))
akrun
  • 874,273
  • 37
  • 540
  • 662