1

I am trying to create a count table from a data table that looks like this:

df <- data.frame("Spring" = c("skirt, pants, shirt", "tshirt"), "Summer" = 
c("shorts, skirt", "pants, shoes"), Fall = c("Scarf", "purse, pants"))

               Spring        Summer         Fall
1 skirt, pants, shirt shorts, skirt        Scarf
2              tshirt  pants, shoes purse, pants

and then a count table that looks like this in the end:

output <- data.frame("Spring" = 4, "Summer" = 4, Fall = 3)

  Spring Summer Fall
1      4      4    3

So, I would just like it to count the unique values in a column for each season. I am having trouble with this because of the commas separating values within 1 cell. I tried using length(unique())), but it is not giving me the correct number because of the columns.

Any help is appreciated!!!

tmfmnk
  • 38,881
  • 4
  • 47
  • 67
Sarah
  • 411
  • 4
  • 14

2 Answers2

1

One tidyverse possibility could be:

df %>%
 mutate_if(is.factor, as.character) %>%
 gather(var, val) %>%
 mutate(val = strsplit(val, ", ")) %>%
 unnest() %>%
 group_by(var) %>%
 summarise(val = n_distinct(val))

  var      val
  <chr>  <int>
1 Fall       3
2 Spring     4
3 Summer     4

If you want to match the desired output exactly, then you can add spread():

df %>%
 mutate_if(is.factor, as.character) %>%
 gather(var, val) %>%
 mutate(val = strsplit(val, ", ")) %>%
 unnest() %>%
 group_by(var) %>%
 summarise(val = n_distinct(val)) %>%
 spread(var, val)

   Fall Spring Summer
  <int>  <int>  <int>
1     3      4      4

Or using the basic idea from @Sonny (this requires just dplyr):

df %>%
 mutate_if(is.factor, as.character) %>%
 summarise_all(list(~ n_distinct(unlist(strsplit(., ", ")))))

  Spring Summer Fall
1      4      4    3
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
  • Thank you, however, this isn't working for my actual data because I have situations where some of the columns are the same, such as if the table is structured like this in the beginning. Do you know how to fix this??? – Sarah Apr 26 '19 at 18:33
  • df <- data.frame("Spring" = c("skirt, pants, shirt", "tshirt", "skirt, pants, shirt"), "Summer" = c("shorts, skirt", "pants, shoes", "shorts, skirt"), Fall = c("Scarf", "purse, pants", "scarf")) – Sarah Apr 26 '19 at 18:33
  • Can you please describe what exactly is the problem? – tmfmnk Apr 26 '19 at 18:36
  • I did test <- as.data.frame(test) after that line of code. Sorry forgot to put that there. – Sarah Apr 26 '19 at 18:55
  • I don't see how you should get 8 as the answer (I'm getting 10). Please use `dput()` to provide the structure of your real data. – tmfmnk Apr 26 '19 at 19:00
  • Shouldn't it be 8 because there are 8 unique values. They are: SSDU, EGMU, EGTM, EGBU, EGHU, EGCP, SGDU, and SSBU ? – Sarah Apr 26 '19 at 19:02
  • 1
    Now I see the problem. You can use: `df %>% mutate_if(is.factor, as.character) %>% mutate(test = strsplit(test, ", ")) %>% unnest() %>% summarise_all(n_distinct)`. – tmfmnk Apr 26 '19 at 19:12
  • Yay it works! One last question, if I'm doing this for a dataset with multiple columns, is there a way to calculate the counts for each column all at once? I get this error when I try to run the entire dataset "Error in mutate_impl(.data, dots) : Evaluation error: non-character argument.". But it works when I run each column individually. – Sarah Apr 26 '19 at 19:17
  • Yes, then both of the possibilities provided in the post should work. – tmfmnk Apr 26 '19 at 19:19
  • 1
    Great!! Thank you so much. You were a great help!! – Sarah Apr 26 '19 at 19:22
1

Using summarise_all:

getCount <- function(x) {
  x <- as.character(x)
  length(unique(unlist(strsplit(x, ","))))
}

library(dplyr)
df %>%
  summarise_all(funs(getCount))
  Spring Summer Fall
1      4      4    3
Sonny
  • 3,083
  • 1
  • 11
  • 19
  • Thank you, but I'm not getting the correct answer with my data. This is what I have: This is my actual data: test <- c("SSDU, EGMU, EGTM", "EGBU", "SSDU, EGMU, EGTM", "EGBU", "EGHU", "EGMU", "EGBU", "SSDU, EGCP", "EGHU", "SGDU, SDU, SSBU") test<- as.data.frame(test) With your code, I get that the answer is 10, but it should be 8 – Sarah Apr 26 '19 at 18:57