I have a data frame like this:
structure(list(ref = c("1_S126_L006", "1_S126_L006", "1_S126_L006",
"1_S126_L006", "1_S126_L006", "1_S126_L006", "1_S126_L006", "1_S126_L006",
"1_S126_L006", "1_S126_L006", "1_S126_L006", "1_S126_L006", "150_S96_L005",
"150_S96_L005", "150_S96_L005", "150_S96_L005", "150_S96_L005",
"150_S96_L005", "150_S96_L005", "150_S96_L005"), Escherichia_coli_CyaA_1 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, "N142S", "G222S", NA, NA,
NA, NA, NA, NA, NA, NA), Escherichia_coli_EF_Tu = c(".", ".",
".", NA, NA, NA, NA, NA, NA, NA, NA, NA, ".", NA, NA, NA, NA,
NA, NA, NA), Escherichia_coli_GlpT = c(NA, NA, NA, NA, NA, NA,
NA, "E448K", NA, NA, NA, NA, NA, NA, NA, NA, NA, "E448K", NA,
NA), Escherichia_coli_PtsI = c(NA, NA, NA, NA, NA, NA, NA, NA,
"R367K", NA, NA, NA, NA, NA, NA, NA, NA, NA, "R367K", NA), Escherichia_coli_UhpT = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_),
fabG = c(NA, NA, NA, NA, NA, "D105E", NA, NA, NA, NA, NA,
NA, NA, NA, NA, "D105E", NA, NA, NA, NA), gyrA_8 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, "S83L", NA, NA, NA, NA, NA,
NA, NA, NA, NA, "S83L"), gyrB_1 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_), marR = c(NA, NA, NA, "G103S",
"Y137H", NA, NA, NA, NA, NA, NA, NA, NA, "G103S", "Y137H",
NA, NA, NA, NA, NA), nfsA = c(NA, NA, NA, NA, NA, NA, "Y45C",
NA, NA, NA, NA, NA, NA, NA, NA, NA, "Y45C", NA, NA, NA),
ompF = c(NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), parC_3 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_)), .Names = c("ref", "Escherichia_coli_CyaA_1",
"Escherichia_coli_EF_Tu", "Escherichia_coli_GlpT", "Escherichia_coli_PtsI",
"Escherichia_coli_UhpT", "fabG", "gyrA_8", "gyrB_1", "marR",
"nfsA", "ompF", "parC_3"), row.names = c(NA, -20L), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), vars = "ref", drop = TRUE, indices = list(
0:11, 12:19), group_sizes = c(12L, 8L), biggest_group_size = 12L, labels = structure(list(
ref = c("1_S126_L006", "150_S96_L005")), row.names = c(NA,
-2L), class = "data.frame", vars = "ref", drop = TRUE, .Names = "ref"))
What I want to do is to collapse the whole data frame so that i only get one line per entry in the "ref" column. If multiple values are present in same column, they should be pasted together and separated by "," in the same cell. I have previously used the following to collapse the whole data frame into one line for each entry in "ref" column:
library(dplyr)
func_paste <- function(x) paste(unique(sum(x, na.rm = T)), collapse = ",")
df %>%
group_by(ref) %>%
summarise_all(funs(func_paste))
This worked on some other dataset, but i cannot for the life of me figure out why i still get the error:
Error in summarise_impl(.data, dots) :
Evaluation error: invalid 'type' (character) of argument.
I have read a few posts on this error already, like here and here, and they suggested to try group_by(x) %>% summarise_each(funs(sum))
, but this only works with numerical data and not character data. As far as I have understood, it has something to do with the sum() function, since it is character data. Any suggestions?
EDIT
If i run it without the sum() function, it seems to do the trick. However, without the na.rm = T
part, it now pastes all NA with the values. How do I make it ignore this?