1

Sorry about my english, i need some help.

With this dataset:

+--------+------------+---------+---------+----------+
| PEOPLE |    DATE    | EVENT_A | EVENT_B | BEVENT_C |
+--------+------------+---------+---------+----------+
| MIKE   | 04/08/2013 |       1 |       1 |        1 |
| PETE   | 10/08/2013 |       1 |       0 |        1 |
| PETE   | 25/08/2013 |       1 |       0 |        1 |
| PETE   | 15/09/2013 |       1 |       0 |        1 |
| MIKE   | 28/09/2013 |       1 |       1 |        1 |
| PETE   | 19/10/2013 |       1 |       1 |        1 |
| MIKE   | 30/10/2013 |       0 |       1 |        1 |
| MIKE   | 09/11/2013 |       1 |       1 |        1 |
+--------+------------+---------+---------+----------+

Basically i need to count the number of combinations grouped by n events with value of 1. I don't know what approach to take for achieve this in R for example. The output should be something like this:

+-------+-------+------------------------+---------+---------+--------+
| #MIKE | #PETE | #N EVENTS COMBINATIONS |         |         |        |
+-------+-------+------------------------+---------+---------+--------+
|     3 |     1 | COMBINATIONS WITH 2    | EVENT A | EVENT B |        |
|     2 |     4 | COMBINATIONS WITH 2    | EVENT A | EVENT C |        |
|     4 |     1 | COMBINATIONS WITH 2    | EVENT B | EVENT C |        |
|     3 |     2 | COMBINATIONS WITH 3    | EVENT A | EVENT B | EVENT C|
+-------+-------+------------------------+---------+---------+--------+

I need this for every people and for any number of unique events (columns)

Thanks in advance Vince.

Vince
  • 507
  • 8
  • 21
  • please include code blocks, see https://meta.stackoverflow.com/questions/251361/how-do-i-format-my-code-blocks – MichaelChirico Aug 08 '17 at 21:02
  • for you to be helped, you need to atleaste give us the data to work with and the expected results.. not in a jpg format. But in a dataframe.. please check the link given above – Onyambu Aug 09 '17 at 02:03
  • Sorry, i'll try to elaborate and editing with aditional infomation. Thank you – Vince Aug 09 '17 at 05:39
  • Please make sure any data you share is [reproducible](http://stackoverflow.com/questions/5963269) – Sotos Aug 09 '17 at 07:01

1 Answers1

0

One possibility is to use dplyr, piping, and tidyr (see more about them here and here).

Given your data, I would solve your problem like so:

library(dplyr)  # for data manipulation and piping
library(tidyr)  # for data reshaping

# 1. create the data
df <- data_frame(
 people = c("Mike", "Pete", "Pete", "Pete", "Mike", "Pete", "Mike", "Mike"),
 event_a = c(rep(1, 6), 0, 1),
 event_b = c(1, 0, 0, 0, rep(1, 4)),
 event_c = c(rep(1, 8))
)

# create a dummy var for each event-combination
df2 <- df %>% 
 mutate(ab = event_a & event_b,
        ac = event_a & event_c,
        bc = event_b & event_c,
        abc = event_a & event_b & event_c)

# reshape data to the long format using tidyr::gather
df3 <- df2 %>% 
 # we dont need the original events anymore -> deselect them
 select(-contains("event")) %>% 
 # reshape from long to short
 gather("var", "value", -people) %>%
 # filter only the positive matches
 filter(value == T)

df3 %>% 
 # for each combination ...
 group_by(var) %>% 
 # ... count the number of cases
 summarise(n_mike = sum(people == "Mike"),
           n_pete = sum(people == "Pete")) %>%
 # create the text-variable
 mutate(event_combs = sprintf("Combinations with %d", nchar(var))) %>% 
 # reorder to have it your format
 select(n_mike, n_pete, event_combs, var)
#> # A tibble: 4 x 4
#>   n_mike n_pete         event_combs   var
#>    <int>  <int>               <chr> <chr>
#> 1      3      1 Combinations with 2    ab
#> 2      3      1 Combinations with 3   abc
#> 3      3      4 Combinations with 2    ac
#> 4      4      1 Combinations with 2    bc

Generalization

To generalize this to arbitrary* many events (* max 26 so far as we use letters, to extend that is left as an exercise...), we can use the expand.grid() to generate all possible event-combinations and then use apply to filter the respective combinations.

The code would look like this:

df <- data_frame(
 people = c("Mike", "Pete", "Pete", "Pete", "Mike", "Pete", "Mike", "Mike"),
 event_a = c(rep(1, 6), 0, 1),
 event_b = c(1, 0, 0, 0, rep(1, 4)),
 event_c = c(rep(1, 8)),
 event_d = c(1, 1, 0, 0, 0, 1, 0, 1)
)

# take only the events
df_events <- df %>% select(starts_with("event"))
# create all possible event combinations
# also: discard the first rows (all-zeros)
event_combs <- expand.grid(rep(list(0:1), ncol(df_events)))[-1, ]

# 'loop' over the possible combinations, and find the matches
res_list <- apply(event_combs, 1, function(row) {
 # row now contains which events we choose
 row <- as.logical(row)
 # var now contains the names of the events. i.e., 'a', 'abc', or 'bc'
 var <- paste(letters[1:length(row)][row], collapse = "")

 # combine the data into a data_frame
 data_frame(var = var,
            people = df$people,
            # check if per row all selected events are true
            value = rowSums(df_events[, row]) == sum(row))
})

# bind the results together
df3 <- bind_rows(res_list) %>% filter(value == T)

# same as before...
df3 %>% 
 group_by(var) %>% 
 summarise(n_mike = sum(people == "Mike"),
           n_pete = sum(people == "Pete"))
#> # A tibble: 15 x 3
#>      var n_mike n_pete
#>    <chr>  <int>  <int>
#>  1     a      3      4
#>  2    ab      3      1
#>  3   abc      3      1
#>  4  abcd      2      1
#>  5   abd      2      1
#>  6    ac      3      4
#>  7   acd      2      2
#>  8    ad      2      2
#>  9     b      4      1
#> 10    bc      4      1
#> 11   bcd      2      1
#> 12    bd      2      1
#> 13     c      4      4
#> 14    cd      2      2
#> 15     d      2      2
David
  • 9,216
  • 4
  • 45
  • 78
  • Thank you David, awasome. It would be great if the "mutate" step could work for any number of columns and not creating the combinations by hand. Could it be done?. – Vince Aug 09 '17 at 09:21
  • Yes. Thank you so much David – Vince Aug 09 '17 at 11:49
  • Sorry David. It shows me an error Error in filter_impl(.data, quo) : Evaluation error: object 'value' not found. On df3 step. – Vince Aug 09 '17 at 12:59
  • Still works on my machine, make sure that you have the latest dplyr version installed (currently at 0.7.2). – David Aug 09 '17 at 15:02