0

I have a dataframe named "test" like this:

      group   v   x
1      a    110 114
2      a    90 150
3      c    57 100
4      d    53  98
5      e   114  67
6      f   143 126
7      g   110  95
8      g   106 101
9      i   103  70
10     j   149  73

I also have a dataframe of unique group values called "hold_df"

  groups   
1      a 
2      c  
3      d  
4      e 
5      f 
6      g 
7      i 
8      j 

I want to add columns to hold_df that include the count of unique occurences in the test dataframe, and the number of unique occurences when column v is over a certain threshold (e.g. 100)

  groups  unique uniqueConditional  
1      a    2       1
2      c    1       0
3      d    1       0
4      e    1       1
5      f    1       1
6      g    2       2
7      i    1       1
8      j    1       1
www
  • 38,575
  • 12
  • 48
  • 84
NBC
  • 1,606
  • 4
  • 18
  • 31
  • 2
    Relevant posts - https://stackoverflow.com/questions/9809166/count-number-of-rows-within-each-group and https://stackoverflow.com/questions/7450600/count-number-of-observations-rows-per-group-and-add-result-to-data-frame and https://stackoverflow.com/questions/11580650/aggregate-count-rows-that-match-a-condition-group-by-unique-values - probably more out there too. – thelatemail Jul 31 '17 at 23:02

2 Answers2

1

A solution from dplyr. We can use group_by and summarise to summarise the test data frame. test2 is the final output.

library(dplyr)

test2 <- test %>%
  group_by(group) %>%
  summarise(unique = n(), uniqueConditional = sum(v > 100))

test2
    # A tibble: 8 x 3
  group unique uniqueConditional
  <chr>  <int>             <int>
1     a      2                 1
2     c      1                 0
3     d      1                 0
4     e      1                 1
5     f      1                 1
6     g      2                 2
7     i      1                 1
8     j      1                 1

In this case, test2 is the same as your desired output based on hold_df. However, if your hold_df is a subset of the test2. We can do the following to filter the desired group.

test3 <- test2 %>% semi_join(hold_df, by = c("group" = "groups")) 

Data Preparation

test <- read.table(text = "      group   v   x
1      a    110 114
                   2      a    90 150
                   3      c    57 100
                   4      d    53  98
                   5      e   114  67
                   6      f   143 126
                   7      g   110  95
                   8      g   106 101
                   9      i   103  70
                   10     j   149  73",
                   header = TRUE, stringsAsFactors = FALSE)

hold_df <- read.table(text = "  groups   
1      a 
                      2      c  
                      3      d  
                      4      e 
                      5      f 
                      6      g 
                      7      i 
                      8      j ",
                      header = TRUE, stringsAsFactors = FALSE)
www
  • 38,575
  • 12
  • 48
  • 84
  • Thanks! The test3 line of code gives me an error though. Are you sure about it? – NBC Aug 01 '17 at 02:53
  • You got an error from the example datasets or your actual datasets? Without more information, I have no idea what happened. – www Aug 01 '17 at 03:22
0

With base functions :

hold_df <- cbind(
  setNames(data.frame(table(test$group)),c("groups","unique")),
  unique_conditional = data.frame(table(subset(test,v>100)$group))[,2])

#   groups unique unique_conditional
# 1      a      2                  1
# 2      c      1                  0
# 3      d      1                  0
# 4      e      1                  1
# 5      f      1                  1
# 6      g      2                  2
# 7      i      1                  1
# 8      j      1                  1
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167