0

This is what I have tried so far. It works, but it only tells me the p.value of the data that has no NA's. Much of my data has NA values in a few places up to 1/3rd of the data.

normal <- apply(cor_phys, 2, function(x) shapiro.test(x)$p.value)

I want to try adding na.rm to the function, but it's not working. Help?

rawr
  • 20,481
  • 4
  • 44
  • 78
Madison
  • 1
  • 3
  • 1
    `normal <- apply(cor_phys, 2, function(x) shapiro.test(x[!is.na(x)])$p.value)` There is no argument you can apply to `shapiro.test` to remove `NA` values, you just need to subset the vector itself to exclude `NA`, then supply that to `shapiro.test`. – caldwellst Feb 27 '20 at 19:26
  • 2
    @caldwellst `shapiro.test` has this line `x <- sort(x[complete.cases(x)])` which removes NAs so there must be something else causing the problem – rawr Feb 27 '20 at 19:36
  • 1
    Ah, nice catch. Could be coming from this in the documentation of `shapiro.test` then. "Missing values are allowed, but the number of non-missing values must be between 3 and 5000." – caldwellst Feb 27 '20 at 19:40
  • after adding in the `[!is.na(x)]` the output is still the same. Any columns with even one `NA` are omitted in the output. – Madison Feb 27 '20 at 19:46
  • @MadisonPope can you add the output of `dput(head(cor_phys))` to your question – rawr Feb 27 '20 at 19:47

1 Answers1

0
#calculate the correlations between all variables
corres <- cor_phys %>%                  #cor_phys is my data
  as.matrix %>%
  cor(use="complete.obs") %>%           #complete.obs does not use NA
  as.data.frame %>%
  rownames_to_column(var = 'var1') %>%
  gather(var2, value, -var1)

#removes duplicates correlations
corres <- corres %>%
  mutate(var_order = paste(var1, var2) %>%
         strsplit(split = ' ') %>%
         map_chr( ~ sort(.x) %>% 
         paste(collapse = ' '))) %>%
  mutate(cnt = 1) %>%
  group_by(var_order) %>%
  mutate(cumsum = cumsum(cnt)) %>%
  filter(cumsum != 2) %>%
  ungroup %>%
  select(-var_order, -cnt, -cumsum)        #removes unneeded columns

I did not write this myself, but it is the answer that I used and worked for my needs. The link to the page I used is: How to compute correlations between all columns in R and detect highly correlated variables

Madison
  • 1
  • 3