I have a dataframe with categories and numerous corresponding values. Values here are non-numeric. The categories are in triplicate and need to be condensed to one row per category, while also condensing corresponding values.
Current format looks like this:
Df <- data.frame(A = c("category1", "category1", "category1",
"category2", "category2", "category2",
"category3", "category3", "category3"),
B = c("NA", "NA", "a",
"NA", "b", "NA",
"NA", "NA", "a"),
C = c("b", "NA", "NA",
"a", "NA", "NA",
"b", "NA", "NA"),
D = c("NA", "ab", "NA",
"NA", "NA", "ab",
"NA", "NA", "NA"),
E = c("NA", "NA", "NA",
"NA", "NA", "NA",
"NA", "c", "NA"))
I want data to look like this:
Category 3 is a bit of a nuisance as it forces NA values to be introduced. Last column would be empty if not for one "c" in category 3. Category 3 is the only one out of hundreds of categories that has a value in the 5th column and none in the fourth. If it is easier I can perhaps just drop this category for now and add it in manually later.
Originally I had both numeric and non-numeric values but with the following code I was able to condense almost as intended:
data1 <- data %>%
group_by(category) %>%
summarize_all(list(~toString(unique(.))))
However this did not work for the non-numeric columns above, leaving only one column value per category. The answer to the question needs to retain the category column to allow me to merge back with numeric column table.