1

I have a data frame with two columns: LOOKUP which contains an index number, and the other, VALUE which holds a string associated with that number:

LOOKUP    VALUE
1000      Apple
100       Banana
10        Grape
1         Orange

Shown below in R code:

dat <- data.frame(LOOKUP= c(1000, 100, 10, 1),
                  VALUE = c("Apple", "Banana", "Grape", "Orange"))

In this example, there are 15 possible sums of the lookup value:

  • 4 sums where only one number summed (1000, 100, 10, 1)
  • 6 sums of two numbers (1000 + 100, 1000 + 10, 1000 + 1, 100 + 10, 100 + 1, 10 + 1)
  • 4 sums of three numbers (1000 + 100 + 10, 1000 + 100 + 1, 1000 + 10 + 1, 100 + 10 + 1)
  • 1 sum of four numbers (1000 + 100 + 10 + 1)

These sums are used to understand, for lack of a better description, what fruits are in a cart. For example, if the sum = 1100, we know that the cart has an apple and a banana. Does anybody know how I would do this for all possible combinations in my set? My desired output is a new data frame along the lines of:

SUM    VALUES
1111   Apple, Banana, Grape, Orange
...
1100   Apple, Banana
...
11     Grape, Orange
1      Apple
DJC
  • 1,491
  • 6
  • 19
  • 2
    Related: https://stackoverflow.com/questions/36197018/how-to-get-all-possible-subsets-of-a-character-vector-in-r/36197578 – MrFlick Feb 05 '20 at 16:52

1 Answers1

1

One solution involving purrr, tibble, dplyr and tidyr could be:

map(.x = 1:nrow(dat), ~ combn(dat$VALUE, .x, paste, collapse = ",") %>%
     enframe()) %>%
 bind_rows() %>%
 rowid_to_column() %>%
 separate_rows(value) %>%
 left_join(dat, by = c("value" = "VALUE")) %>%
 group_by(rowid) %>%
 summarise(comb = toString(value),
           sum = sum(LOOKUP))

  rowid comb                           sum
   <int> <chr>                        <dbl>
 1     1 Apple                         1000
 2     2 Banana                         100
 3     3 Grape                           10
 4     4 Orange                           1
 5     5 Apple, Banana                 1100
 6     6 Apple, Grape                  1010
 7     7 Apple, Orange                 1001
 8     8 Banana, Grape                  110
 9     9 Banana, Orange                 101
10    10 Grape, Orange                   11
11    11 Apple, Banana, Grape          1110
12    12 Apple, Banana, Orange         1101
13    13 Apple, Grape, Orange          1011
14    14 Banana, Grape, Orange          111
15    15 Apple, Banana, Grape, Orange  1111

Sample data:

dat <- data.frame(LOOKUP= c(1000, 100, 10, 1),
                  VALUE = c("Apple", "Banana", "Grape", "Orange"),
                  stringsAsFactors = FALSE)
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
  • Thanks so much! I'm trying to run your code but am getting the following error message: 'Error in class(out) <- class(x0) : adding class "factor" to an invalid object'. Any chance there were additional libraries outside of purrr, dplyr, and tidyr that you were using? – DJC Feb 05 '20 at 17:23
  • I used `stringsAsFactors = FALSE` when importing the data. Please see the updated post. – tmfmnk Feb 05 '20 at 17:24
  • You're a legend. Thanks much. Just a heads up, I think need to add 'Tibble' to your list of required libraries. I needed to load that for enframe. Thanks again :) – DJC Feb 05 '20 at 17:26
  • You are right, I always consider it to be a part of `dplyr` :) Glad it helped! – tmfmnk Feb 05 '20 at 17:28