0

I am total newby to R and want to apply a specific function to a list of data.frames. The dataframes contain rows completely filled with 0 and I want to delete them from my dataframe.

Sample dataframe: | OTU ID | Sample 1 | Sample 2 | Sample 3 | Sample 4 | | :--- | :--- | :--- | :--- | :--- | | abc | 12 | 24 | 0 | 120 | | bcd | 0 | 0 | 0 | 0 | | efg | 12 | 24 | 0 | 120 | | hij | 24 | 9 | 13 | 4 |

For one table the code would be as follows:

#' in column 1 are the rownames, so the rowSums-function should be applied to all columns besides column 1

all_zero <- rowSums(table1[,-1]) == 0

#' then the rows that include only 0 should be deleted from the data.frame

table1 <- filter(table1, !all_zero)

As I have 10 different data.frames on which I want to apply the function, I want to create a for-loop or lapply()

#' first I created a list of the data.frames

all_df <- mget(ls([1:10])

and then I get stuck. Maybe you can help me finalize the options

a) for-loop (Here maybe it is silly to create so many new variables, better to get out a list?)

for (df in all_df) {
  paste0("no_reads_", df) <- rowSums(df[,-1]) == 0
  paste0(names(all_df), "_neu") <- filter(df, !paste0("no_reads_", df))
}

b) lapply (Here I don't know how to include best the second step of the command)

lapply(seq_along(all_df),
       function(df) rowSums(all_df[,-1][[df]]) == 0)

You would help me a lot :) Best, Kathrin

Kathrin
  • 57
  • 1
  • 6

1 Answers1

1

Using a for loop:

library(dplyr)

for(i in seq_along(df_list)){
  df_list[[i]] <- df_list[[i]] %>%
    rowwise() %>%
    mutate(sum = sum(c_across(-"OTU_ID")) %>%
    filter(sum > 0)
}

Using purrr::map()

df_list %>%
  map(~ rowwise(.x) %>%
        mutate(sum = sum(c_across(-"OTU ID"))) %>%
        filter(sum > 0))
latlio
  • 1,567
  • 7
  • 15
  • With the for-loop,rowwise gives an error because it includes the first column ("OTU ID") which is type character. The other columns are numeric. – Kathrin Jan 12 '21 at 14:52
  • oh ok, that wasn't clear from your question. You can insert a `select(-"OTU ID")` before `rowwise()`. Let me know if that fixes the issue – latlio Jan 12 '21 at 15:00
  • Perfect, that solved my problem very quickly :) – Kathrin Jan 12 '21 at 15:07
  • ah, in the output, the OTU ID - Column should be included and with this command it is excluded. It should only be excluded in calculating the sum. Can we include something that includes rowwise(besides column 1)? Sorry for the problems.. – Kathrin Jan 12 '21 at 15:22
  • it's difficult for me to debug without you providing a sample dataset, but try the edit that i just posted – latlio Jan 12 '21 at 15:31
  • i added a sample dataframe, because the solution you recommended did not work. Is this now more comprehensible? – Kathrin Jan 12 '21 at 15:41
  • Please use `dput(head(data, 5)` to make a sample dataframe. Please see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – latlio Jan 12 '21 at 15:48
  • Sooo, I think it works if you delete the everything – Kathrin Jan 12 '21 at 15:48