Create groups where the letters appear one and only once time in Rstudio

Question

I expose my problem. I have these 3 dataframe:

single

     Mat         Price
     A           1029.90
     B            568.52
     C            497.12
     D            573.50
     E            217.92

double

         Mat1  Mat2        Price
         A      C         1529.61
         A      D         1623.49
         A      E         1325.86
         B      C         1050.64
         B      D         1146.65
         B      E         849.02
         C      D         999.42
         C      E         700.03
         D      E         776.41
triple
            Mat1  Mat2   Mat3     Price
             B      C      D     1564.98
             B      C      E     1267.30
             C      D      E     1350.45
             B      D      E     1202.33

Using these 3 dataframes I have to build another dataframe (or list) that gives me all the possible combinations that contain once and only letters from A to E. For example, some possible combinations can be: A | B | C | D | E | taken from the dataframe single, or AC taken from the dataframe double and B | D | E taken from the dataframe single, or BCD taken from the dataframe triple and e AE taken from the dataframe double and so on for all combinations.

Moreover for every combination found I would like to dedicate a column of the dataframe to the prices calculated as:

(sum of the price from which the material was taken) + 500 * (number of groupings used in the combination).

Taking up the previous examples: the combination A | B | C | D | E | taken all in the dataframe single will have a price equal to (1029.90 + 568.52 + 497.12 + 573.50 + 217.92) + 500 * (5)

The combination AC | B | D | E will have a price equal to (1529.61 + 568.52 + 573.50 + 217.92) + 500 * (4)

The AE | BCD combination will have a price equal to (1325.86 + 1564.98) + 500 * (2)

Is it possible to create a function in R that automates this process? Thanks so much

Is this the full extent of the data? If it isn't, is it because there are more letters or more combinations in `double` and `triple`? — Calum You, Apr 03 '18 at 18:31
This is the complete data because I put a constraint on the creation of dataframes. In general I need a function that does the procedure described above for any single, double or triple dataframe — Bort54, Apr 04 '18 at 07:02
I was thinking about it yesterday and it seems a little more complex than I was realising (https://stackoverflow.com/questions/4632322/finding-all-possible-combinations-of-numbers-to-reach-a-given-sum). figuring out a solution but if you explained where these dataframes come from or if you know whether the combinations will always be the same, it would probably be simpler — Calum You, Apr 05 '18 at 19:57
My initial situation is as follows: I have a starting dateframe that shows me the daily sales (of fruit and vegetables). I have to distribute materials A, B, C, D and E. The first thing I have to do is check the suppliers in order to buy these materials. Each material is associated with an availability that requires the customer and as I have to organize the truck that has a limited size (34 pallets) I can only associate some of these materials. — Bort54, Apr 06 '18 at 15:39
That's why there are not all combinations of double and triple A, B, C, D and E. But to do this I've already written a function in R that output the three dataset written above or 'single', 'double' and 'triple'. — Bort54, Apr 06 '18 at 15:39
In this example I'd like to get a table similar to this with R: https://imageshack.com/a/img923/6279/p9Jakq.png — Bort54, Apr 09 '18 at 09:48

score 0 · Accepted Answer · answered Apr 09 '18 at 19:13

Here is my probably-overly-convoluted answer. Not knowing how much your different dataframes can actually vary, I made it so that it can deal with all other combinations that could occur in double and triple, but there might be a much more streamlined way to do it that I haven't thought of. The hard part I think is using crossing and then some different filter and distinct calls to get it down to just the 29 combinations that you want. After that, it is just a gather and summarise to produce the right numbers. I left_join back onto the originals so you can see what went into each result. If you know these 29 combinations will always be the same and hardcoded, it is a lot simpler.

library(tidyverse)
single <- read_table2("Mat         Price
A           1029.90
B            568.52
C            497.12
D            573.50
E            217.92")
double <- read_table2("Mat1  Mat2        Price
A      C         1529.61
A      D         1623.49
A      E         1325.86
B      C         1050.64
B      D         1146.65
B      E         849.02
C      D         999.42
C      E         700.03
D      E         776.41")
triple <- read_table2("Mat1  Mat2   Mat3     Price
B      C      D     1564.98
B      C      E     1267.30
C      D      E     1350.45
B      D      E     1202.33")

# Create vectors of letter groups
ones <- single$Mat
twos <- str_c(double$Mat1, double$Mat2)
threes <- str_c(triple$Mat1, triple$Mat2, triple$Mat3)

# Create vector of permutations of ABCDE
options <- ones %>%
  gtools::permutations(5, 5, .) %>%
  as_tibble() %>%
  unite("option", V1:V5, sep = "") %>%
  `[[`(1)

# Create table with prices to join combinations onto
lookup <- bind_rows(
 single %>% mutate(rowcomb = Mat),
 double %>% unite(rowcomb, Mat1:Mat2, remove = FALSE, sep = ""),
 triple %>% unite(rowcomb, Mat1:Mat3, remove = FALSE, sep = "")
) %>%
  select(rowcomb, price = Price)

# Get all possible combinations of letter groups that could make five letters
combinations <- bind_rows(
  crossing(ones, ones, ones, ones, ones),
  crossing(ones, ones, ones, twos),
  crossing(ones, ones, threes),
  crossing(ones, twos, twos),
  crossing(twos, threes)
) %>%
  mutate_all(~ replace_na(., "")) %>%
  unite("string", ones:twos1, sep = "", remove = FALSE) %>%
  filter(string %in% options) %>% # Remove any that have repeated letters
  # Add column with the number of elements in each combination
  bind_cols(groupings = pmap_int(.[, -1], function(...) sum(c(...) != ""))) %>%
  bind_cols( # Add column that lets us remove different permutations of the same element combinations
    row = .[, 2:9] %>%
      pmap(function(...) c(...)) %>%
      map_chr(~str_c(str_sort(.), collapse = "_"))
  ) %>%
  distinct(row, .keep_all = TRUE) %>%
  rowid_to_column(var = "comb_id") # 29 final combinations

total_price <- combinations %>%
  gather("table", "letter", ones:twos1) %>%
  left_join(lookup, by = c("letter" = "rowcomb")) %>%
  group_by(comb_id) %>%
  summarise(total_price = sum(price, na.rm = TRUE))

output <- left_join(combinations, total_price, by = "comb_id") %>%
  mutate(end_price = total_price + 500 * groupings) %>%
  select(comb_id, row, groupings, end_price)

output
#> # A tibble: 29 x 4
#>    comb_id row          groupings end_price
#>      <int> <chr>            <int>     <dbl>
#>  1       1 ___A_B_C_D_E         5     5387.
#>  2       2 ____A_B_C_DE         4     4872.
#>  3       3 ____A_B_CE_D         4     4872.
#>  4       4 ____A_B_CD_E         4     4816.
#>  5       5 ____A_BE_C_D         4     4950.
#>  6       6 ____A_BD_C_E         4     4892.
#>  7       7 ____A_BC_D_E         4     4872.
#>  8       8 ____AE_B_C_D         4     4965.
#>  9       9 ____AD_B_C_E         4     4907.
#> 10      10 ____AC_B_D_E         4     4890.
#> # ... with 19 more rows

Created on 2018-04-09 by the reprex package (v0.2.0).

Create groups where the letters appear one and only once time in Rstudio

1 Answers1