0

I have a dataset from a survey where "yes" and "no" have been assigned 0 & 1 but NAs have been given the value 88888

library(dplyr)
library(corrr)

a <- sample(c(0,1,88888),25,replace =T)
x <- sample(c(0,1,88888),25,replace =T)
y <- sample(c(0,1,88888),25,replace =T)
z <- sample(c(0,1,88888),25,replace =T)    
dat<-tibble(a,x,y,z)

r is treating all of the data as numeric. I am trying to get a correlation between "a" and each of the other variables.

cor<-dat%>%
correlate()%>%
focus(a)

My results are a real mess!

The only solution that I can think of is to set all of the 88888 values to NA, but hoping that there is a better way to deal with this? The data that I am using has >200 categorical variables to consider and about 30 numeric

Thanks!

Gilrob
  • 93
  • 7
  • Thanks, shortly after asking the question, I was indeed able to get it done using the NA approach! I guess I am just hoping that there is a more general approach that doesn't require me to define the values to be changed to NA, even if it's just the inverse, i.e. defining the answers that I am OK with. – Gilrob Jul 31 '23 at 05:00
  • You *can* treat missing answers in a survey as a distinct category instead of `NA` values. However, I don't understand why you use Pearson correlation for a categorical variable (and certainly you cannot do this when missing values are encoded as a large numeric value). I don't know which question your analysis is supposed to answer but possible you could just do a Chi-Squared test or calculate Cramer's correlation coefficient or ... – Roland Jul 31 '23 at 05:13
  • @Roland, because I am a beginner lol. I am really trying to get to a Cramer's correlation coefficient but struggling to find a way to do it across the whole dataset? Like a 2xn table with Cramer's coefficient for a with each variable in turn. Have any pointers for me please? – Gilrob Jul 31 '23 at 07:08

1 Answers1

1

Changing all 88888 values to NA can be done with the following:

dat[dat == 88888] <- NA

This even converts to NA in categorical columns variables.

I hope it helps!