R: Replace NA in for loop with respective but changed variable name

Question

I have a multilevel dataset df on my hands with the following organization:

    
ID     Eye   Video_number  Time  Day   measurement1
40001   L         1         1     1        0.60 
40001   L         2         1     1        0.50 
40001   L         3         1     1        0.80 
40001   L         1         2     1        0.60 
40001   L         2         2     1        0.60 
40001   L         3         2     1        0.60

Goal I am trying to replace cell values of measurements that have a coefficient of variance above 45 with NA, since these values are probably less precise and should be excluded.

The coefficient of variation(sometimes denoted CV) of a distribution is defined as the ratio of the standard deviation to the mean, with $\mu$ and $\sigma$ values obtained from the raw data

I obtained the CV values by Time units (averaging measurement of three videos in one Time unit) with the following function and for loop. I got help from the following threads:

How to correctly use group_by() and summarise() in a For loop in R

Append data frames together in a for loop

# Define function
cv <- function(x){
  sd(na.omit(x))/mean(na.omit(x))*100}


# Variables
vars <- c("measurement1", "measurement2", "measurement3")

# Create a table with all CV values by ID, Eye, Day, and Time
df_cv=data.frame()
for (i in vars){
  df<-df.m2
  df$values<-df[,which(colnames(df.m2)==i)]
  x<-df%>%
    group_by(ID,Eye,Day,Time) %>%
    summarise(Count = n(),
              Mean = mean(values, na.rm = TRUE),
              SD = sd(values, na.rm = TRUE),
              CV = cv(values))%>%
    mutate(Variable=paste(i,"cv",sep="_"))
  df_cv<-rbind(df_cv,x)
  df_cv$CV[is.nan(df_cv$CV)]<-0    # for 0/0 on CV formula giving NaN

}

It resulted in the following table df_cv:

 ID    Eye Day Time Count      Mean         SD        CV         Variable
40001   L   1   1   3       0.56666667  0.057735027 10.1885342  measurement1_cv
40001   L   1   2   3       0.36666667  0.404145188 110.2214150 measurement1_cv
40001   L   1   3   3       0.50000000  0.000000000 0.0000000   measurement1_cv

I reformatted df_cv above to wide format (Variables and CVs across row rather than down a column). This enabled me to merge the CVs with the original df

df_cv<-dcast(df_cv,PIDN+Eye+Day+Time~Variable,value.var = "CV")
df<-merge(df,df_cv,by=c("PIDN","Eye","Day","Time"))

    
ID     Eye   Video_number  Time  Day   measurement1     measurement1_cv
40001   L         1         1     1        0.60           10.1885342
40001   L         2         1     1        0.50           10.1885342
40001   L         3         1     1        0.80           10.1885342
40001   L         1         2     1        0.80           110.2214150 
40001   L         2         2     1        0.30           110.2214150 
40001   L         3         2     1        0.00           110.2214150

I know want to input NAs into the cells of measurement 1 that have a CV>45. I know how to do this measurement by measurement, but I was wondering if there was a for loop capable of doing this, since I have a lot of variables I am analyzing.

df$measurement1[df$measurement1_cv>45]<-NA
df$measurement2[df$measurement2_cv>45]<-NA
df$measurement3[df$measurement3_cv>45]<-NA

Below are my failed attempts:

for (i in vars) {
  df<-df.m3
  df$i[df$i_cv>45]<-NA
}

Error in `$<-.data.frame`(`*tmp*`, "i", value = logical(0)) : 
  replacement has 0 rows, data has 609



for (i in vars) {
  df<-df.m3
  df$i[df$paste(i,"_cv")>45]<-NA
}

Error in df$paste(i, "_cv") : attempt to apply non-function

Any help is greatly appreciated!

`df[,c(2,4,17,23)] <- lapply(df[,c(2,4,17,23)], function(a) replace(a, a > 45, NA))` where the vector of numbers here are the columns you need. It can also be a logical vector, perhaps derived from `startsWith(colnames(df), "measurement")` — r2evans, Oct 06 '20 at 18:24
Thank you! However, I need to apply CV formula by Time instance, which is why I used `group_by(ID,Eye,Day,Time)` I tried the following: `lapply(df[,vars], function(cv) replace(cv,cv>45,NA))` and I don't think it recognizes the unit of Time by which to take the means and sd of the videos. It replaces many values with `NA`. — Shivvy, Oct 06 '20 at 18:49

score 0 · Answer 1 · answered Oct 06 '20 at 19:19

Two thoughts.

But first, augmenting your data a little.

df$measurement2 <- 0.9; df$measurement2_cv <- c(rep(46, 3), rep(44, 3))
library(dplyr)

inline function that operates only on one group at a time (i.e., it ignores grouping, so it must be within dplyr::do).

myfunc <- function(x) {
  CV <- grep("^measurement.*_cv", colnames(x), value = TRUE)
  MEAS <- gsub("_cv$", "", CV)
  CV <- CV[MEAS %in% colnames(x)]
  MEAS <- MEAS[MEAS %in% colnames(x)]
  x[,MEAS] <- Map(function(meas, cv) replace(meas, cv > 45, NA), x[,MEAS], x[,CV])
  x
}
df %>%
  group_by(ID, Eye, Day, Time) %>%
  do(myfunc(.))
# # A tibble: 6 x 9
# # Groups:   ID, Eye, Day, Time [2]
#      ID Eye   Video_number  Time   Day measurement1 measurement1_cv measurement2 measurement2_cv
#   <int> <chr>        <int> <int> <int>        <dbl>           <dbl>        <dbl>           <dbl>
# 1 40001 L                1     1     1          0.6            10.2         NA                46
# 2 40001 L                2     1     1          0.5            10.2         NA                46
# 3 40001 L                3     1     1          0.8            10.2         NA                46
# 4 40001 L                1     2     1         NA             110.           0.9              44
# 5 40001 L                2     2     1         NA             110.           0.9              44
# 6 40001 L                3     2     1         NA             110.           0.9              44

Pivot, calculate, then unpivot, using tidyr.

library(tidyr) # pivot_*
df %>%
  rename_with(.fn = ~ paste0(., "_val"), .cols = matches("^meas.*[^v]$")) %>%
  rename_with(.fn = ~ gsub("(.*)_(.*)", "\\2_\\1", .), .cols = starts_with("meas")) %>%
  pivot_longer(., matches("meas"), names_sep = "_", names_to = c(".value", "meas")) %>%
  mutate(val = if_else(cv > 45, NA_real_, val)) %>%
  pivot_wider(1:5, names_from = "meas", names_sep = "_", values_from = c("val", "cv"))
# # A tibble: 6 x 9
#      ID Eye   Video_number  Time   Day val_measurement1 val_measurement2 cv_measurement1 cv_measurement2
#   <int> <chr>        <int> <int> <int>            <dbl>            <dbl>           <dbl>           <dbl>
# 1 40001 L                1     1     1              0.6             NA              10.2              46
# 2 40001 L                2     1     1              0.5             NA              10.2              46
# 3 40001 L                3     1     1              0.8             NA              10.2              46
# 4 40001 L                1     2     1             NA                0.9           110.               44
# 5 40001 L                2     2     1             NA                0.9           110.               44
# 6 40001 L                3     2     1             NA                0.9           110.               44

(I admit that this looks/feels wonky, should need all of that renaming. But now you need to un-rename for the final step.) Perhaps the first solution is cleaner/better :-)

R: Replace NA in for loop with respective but changed variable name

1 Answers1