1

I have a very large dataframe with missing values. For some groups there are few columns, where all values are missing.

analysis <- data.frame(Col = names(dd), stringsAsFactors = FALSE)
c <- c()

for (i in 1:3) {
  df_group <- subset(dd, dd$group == i)
  for (col in colnames(df_group)) {
    indx <- tail(names(sort(table(df_group[, col]))), 1)
    indx <- ifelse(length(indx) == 0, NA, indx)
    c <- append(c, indx)
  }
  analysis <- cbind(analysis, c)
}

This code without the ifelse gave me c, which was too short (missing values for columns which contains only NAs). With the ifelse I am getting too long c. Is there any other way to change the ifelse?

briturr
  • 51
  • 5
  • 2
    Can you please provide a [MRE](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), i.e. provide a data example (by pasting the output of `head(dput(dd))`, then it's easier to help you, thanks! – starja Aug 08 '22 at 14:50
  • I'm not sure what your goal is. Are you looking for the most common non-missing value per group? (That's my best guess from your title.) If so, I'd pick an answer from the FAQ How to find the statistical mode?--pick one that handles `NA` values [like this one](https://stackoverflow.com/a/25635740/903061), and then pick your favorite way to apply a function by group--see the [sum by group FAQ](https://stackoverflow.com/q/1660124/903061) and replace `sum` with `Mode`. With `dplyr`, `df %>% group_by(group) %>% summarize(across(everything(), Mode))` – Gregor Thomas Aug 08 '22 at 15:33
  • I'm trying to get most common values as in here: https://stackoverflow.com/questions/12187187/how-to-retrieve-the-most-repeated-value-in-a-column-present-in-a-data-frame But the problem comes with columns, that only contain NAs. Dataset has over 1000 columns. – briturr Aug 08 '22 at 18:41
  • Update: I figured out the problem, I needed to bring clus <- c() into the loop – briturr Aug 08 '22 at 19:07

0 Answers0