1

I have a dataframe df with the following observations:

a <- c("A", "A", "A", "A", "B", "B","B", "B")
b <- c(11, 9, 4, 1, NA, 2,3,4)
c <- c(2,3, NA, NA, 25, 4, NA, 2)
d <- c(4,5, 3, NA, NA, 2,NA,NA)

df <- data.frame(a, b,c,d)
df
df <- data.frame(df)
colnames(df) <- c("Letter", "num1", "num2", "num3")
df

Now, I would like to do my calculation with the first column with the three other columns at by using cohen.d function from effsize package, e.g: cohen.d(df$num1, df$Letter) or cohen.d(df$num2, df$Letter). However, before doing that, I need to remove NA values for each numerical column each calculation. The idea that pops up in my mind is I will run a for loop through columns num1, num2, and num3 with num1. How can I use a for loop for calcultions in this case?

Anh
  • 735
  • 2
  • 11
  • 1
    Reshape from wide-to-long, remove NA rows, split, then use lapply. – zx8754 Jul 28 '22 at 10:20
  • 1
    The line `df <- data.frame(df)` does nothing, `df` already is a data.frame. – Rui Barradas Jul 28 '22 at 10:22
  • @RuiBarradas oh yes, already excluded – Anh Jul 28 '22 at 10:24
  • @zx8754 oh yeah thank you.. I did that before too, and I also did my calculations as many single steps. However, the idea is that I would like to wrap everything into one function to get the dataframe of my final results. – Anh Jul 28 '22 at 10:31

3 Answers3

3

This type of problems generally has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format.

The following code reshapes the data, pipes to na.omit, then split/lapply/combine and put the results in a data.frame format.

a <- c("A", "A", "A", "A", "B", "B","B", "B")
b <- c(11, 9, 4, 1, NA, 2,3,4)
c <- c(2,3, NA, NA, 25, 4, NA, 2)
d <- c(4,5, 3, NA, NA, 2,NA,NA)

df <- data.frame(a, b,c,d)
colnames(df) <- c("Letter", "num1", "num2", "num3")

faux <- function(x){
  e <- effsize::cohen.d(value ~ Letter, data = x)
  e2 <- unclass(e)
  c(e2[1:4], 
    lower = unname(e2$conf.int[1]), 
    upper = unname(e2$conf.int[2]), 
    e2[6:8])
}

long <- reshape2::melt(df, id.vars = "Letter") |> na.omit()
res <- lapply(split(long, long$variable), faux)
do.call(rbind.data.frame, res)
#>         method name   estimate        sd     lower    upper       var conf.level magnitude
#> num1 Cohen's d    d  0.9031263  3.598611 -1.155897 2.962150 0.6415931       0.95     large
#> num2 Cohen's d    d -0.7524094 10.410998 -3.754631 2.249812 0.8899453       0.95    medium
#> num3 Cohen's d    d         NA        NA        NA       NA        NA       0.95      <NA>

Created on 2022-07-28 by the reprex package (v2.0.1)


Edit

To run the code above as a for loop, assign the result of split, explicitly create a results vector and call faux(auxiliary function) in the loop.

sp <- split(long, long$variable)
res <- vector("list", length = length(sp))
for(i in seq_along(sp)) {
  res[[i]] <- faux(sp[[i]])
}
do.call(rbind.data.frame, res)
#>      method name   estimate        sd     lower    upper       var conf.level magnitude
#> 1 Cohen's d    d  0.9031263  3.598611 -1.155897 2.962150 0.6415931       0.95     large
#> 2 Cohen's d    d -0.7524094 10.410998 -3.754631 2.249812 0.8899453       0.95    medium
#> 3 Cohen's d    d         NA        NA        NA       NA        NA       0.95      <NA>

Created on 2022-07-28 by the reprex package (v2.0.1)

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • Thank you for your solution. It workd so well. Aber I made a mistake that I want to solve this with `for loop`. Could you produce with `for loop` becasue it is what I has been trying – Anh Jul 28 '22 at 13:03
  • @Anh Done, see the edit. – Rui Barradas Jul 28 '22 at 14:57
  • Hi @Rui Barradas Sorry for taking so long to answer you back. I also figured out end of today. But I am very happy to learn another solution here :D – Anh Jul 28 '22 at 19:37
0

With your data, I have all NA's for cohen estimates and CI's.

However, the below is a way to have all the results at once in a list.

First, let's filter out NA values

df <- df %>% filter(!is.na(b)&!is.na(c)&!is.na(d)) 

Then, run the loop

mycols <- letters[2:4]
lapply(newcols, function(x) effsize::cohen.d(df[,x], df$a) )

[[1]]

Cohen's d

d estimate: NA (NA)
95 percent confidence interval:
lower upper 
   NA    NA 


[[2]]

Cohen's d

d estimate: NA (NA)
95 percent confidence interval:
lower upper 
   NA    NA 


[[3]]

Cohen's d

d estimate: NA (NA)
95 percent confidence interval:
lower upper 
   NA    NA 

This lapply function is nothing else than an (implicit) loop which returns the results into a list.

gaut
  • 5,771
  • 1
  • 14
  • 45
  • Hi thanks @gaut, yes, that is just my fake dataset that I created. The idea is I want to remove `NA` values each column for each turn running my calculation. Btw, your code did not run. Can you check that again? – Anh Jul 28 '22 at 10:54
  • oh sr...I did changed the column names...it worked – Anh Jul 28 '22 at 10:56
  • ah yes I forgot. edited the answer – gaut Jul 28 '22 at 11:00
0

First, to remove the NA values you can use tidyr::drop_na() this will remove any row with an NA value. Then the easiest loop is via the column names you are interested in. So just create a vector of these and use purrr::map to iterate over each.

df <- data.frame(
  Letter = c("A", "A", "A", "A", "B", "B","B", "B"),
  num1 = c(11, 9, 4, 1, NA, 2,3,4),
  num2 = c(2,3, NA, NA, 25, 4, NA, 2),
  num3 = c(4,5, 3, NA, NA, 2,NA,NA)) |>
  tidyr::drop_na() 

purrr::map(c('num1', 'num2', 'num3'),
           ~ effsize::cohen.d(df[[.x]], df$Letter))
mrjoh3
  • 437
  • 2
  • 11
  • drop_na would drop all the rows with NA on any column, not sure OP wants that. – zx8754 Jul 28 '22 at 11:12
  • 1
    @mrjoh Hi, thank you for this. Unfortunately, :( I think remove all NA values at the beginning is not what I want – Anh Jul 28 '22 at 13:09