Add rows with possible combinations in R dataframe

Question

First of all, please accept my apology for a poor title - i'm sure there could a better one, but I lack proper English / math terminology to phrase it the right way. I'm also pretty sure my problem is rather easy, but due to some basic math ignorance I don't even know how to call a decent google search.

I'm trying to find all possible combinations of items within pairs of columns.

Given a data frame like this:

data.frame(obj1 = c("A", "B", "C", "D", "E", "F"),
           obj2 = c("B", "C", "D", "E", "F", "A"),
           obj3 = c("C", "D", "E", "F", "A", "B"),
           obj4 = c("D", "E", "F", "A", "B", "C"),
           obj5 = c("E", "F", "A", "B", "C", "D"),
           obj6 = c("F", "A", "B", "C", "D", "E"))

  obj1 obj2 obj3 obj4 obj5 obj6
1    A    B    C    D    E    F
2    B    C    D    E    F    A
3    C    D    E    F    A    B
4    D    E    F    A    B    C
5    E    F    A    B    C    D
6    F    A    B    C    D    E

I'd like to add new rows in a way that for every pair of columns (obj1-obj2, obj1-obj3, obj1-obj4, ..., obj5-obj6) all combinations of items appear.

For example: in the 1st column pair: obj1-obj2, item A appears only with item B and F. Other item level combinations are missing and that's what I want to get.

Caveats:
-each item (A,...,F) can appear only once within each row of the dataframe
-same letter pairs (of A-B and B-A) are treated as duplicates within a row, but not within a column

Effectively I'd like to grow this dataframe rowwise so that when a random pair of columns is picked, every combination of 6 items is present.

My gut tells me, that I'm looking at a 90 x 6 sized dataframe, but that's just intuition that i'm not able to put in a formula and explain how I came up with this number :)

If my question is not clear, the answer is obvious or in any other way violates any rule, please let me know, so I can try to explain myself

EDIT
After receiving all the comments i'll try to explain myself more clearly.

Consider this simple table of experimental conditions, let's call it table1 and treat it as a between subject table:

In this, simpler case, each participant will be presented with 6 pairs of target-items (A, B, C) taken from columns (col1, col2, col3) in the following way (table2 - within subject table):

Those 6 trials guarantee that for every participant within each pair every combination of target-items is present.

If I where to present 6 different traits (one per trial from table2) in a fixed order (for example: happy, sad, smart, bored, confused, tired) for every participant, after 3 participant each trait would be presented with regards to every combination of target-items.
For participant 1 - trait happy would be presented with targets A - B
For particiapnt 2 - trait happy would be presented with targets B - C
For particiapnt 3 - trait happy would be presented with targets C - A
NOTE that a (theoretical) set of B - A would be considered a duplicate.

What I'm looking for is a way of extending table1 from the above example of 3 items, into a 6 item table1. Naturally table2 will grow as well, but that's taken care of. Table1 is what is causing me problems.

This is how a starting point could look like

Thank you for any help. Best regards.

Possible duplicate of https://stackoverflow.com/q/11095992/3358272 — r2evans, Apr 09 '18 at 20:07
I found that question prior to posting but didn't know how to use the solutions that are provided - specifcally - how to deal with duplicates — blazej, Apr 09 '18 at 20:14
What about this one? https://stackoverflow.com/questions/17171148/non-redundant-version-of-expand-grid — HFBrowning, Apr 09 '18 at 20:14
Or this: https://stackoverflow.com/questions/12245213/how-to-generate-all-possible-combinations-of-vectors-without-caring-for-order There are a lot of pseudo- or real duplicates that may be worth looking at — HFBrowning, Apr 09 '18 at 20:16
`do.call(rbind, combinat::permn(letters[1:3]))` is a good example. (`expand.grid` gives all combinations, not permutations, so it will require specific post-generation filtering. If you had two or perhaps three columns, it might suffice, but more than that and it becomes a little overkill to generate and discard that much.) — r2evans, Apr 09 '18 at 20:17
@HFBrowning, r2evans I think none of those answer does what I'm looking for. For example, combinat::permn will give all possibilities (no caveats taken into account). This is something I could do by hand to simply set up all the conditions. Here I'm specifically looking for unique items within each pair given the structure provided. — blazej, Apr 09 '18 at 20:28
To be clear, the second caveat means that if we have `A-B-C-D-E-F`, all examples with swaps like `B-A-C-D-EF`, `A-B-D-C-E-F`, even `B-A-D-C-F-E` should all not be present in the final output? What about swapping non-neighbour columns, like `C-B-A-D-E-F`?It's not clear what the expected output is to me now — Calum You, Apr 09 '18 at 22:52
ok, reading again I think you need to try to define this a little more carefully, perhaps use fewer letters/columns and give an exact description of what would and wouldn't be a duplicate. you say "when a random pair of columns is picked, every combination of 6 items is present." So each of the 15 pairs columns must have each of the `15` combinations of different letters in it. That doesn't really make it obvious what a duplicate is, because whether or not a row contributes to that condition depends on all the other rows! — Calum You, Apr 09 '18 at 23:03
Thanks Calum, I'll edit my question today and describe it in my usage context more carefully — blazej, Apr 10 '18 at 06:41
@CalumYou I did my best with the edit and a simpler example. Is that more more clear now? — blazej, Apr 10 '18 at 10:34
First, this isn't really a programming question in R or any other language. I *think* I understand what you're trying to do, but it's not something that programming is going to solve for you - you actually need to work through the combinations. Are ``A B C D E F`` and ``A B D E F C`` both valid or not? As I understood it before, they would be two valid combinations - but as you're explaining it now, it seems that they may not be. In terms of combinatorics, you have 6 options for your first position and 5 options for your 2nd position - for 30 options total. Anything else will be repeats. — Melissa Key, Apr 12 '18 at 03:44

Melissa Key · Answer 1 · 2018-04-09T22:22:04.703

1

Looking at this again, consider the following start positions, which can be used to generate 6 combinations, as shown in the answer by Maurits Evers.

A B C D E F
A C E B D F
A E D C B F
A D B E C F

This family is generated by taking positions 1 3 5 2 4 6 of the previous spot. With all the shifts, this is 24 different orderings.

You can generate 24 more in a similar fashion by starting with

F E D C B A

I'm pretty sure we can get another by

A B C E F D

which probably means that the following are also good:

A C F B E D
A F E C B D
A E B F C D

(and likewise for all the F -> A ones.

That makes 96. Did I miss any or duplicate any?

edited Apr 09 '18 at 22:22

answered Apr 09 '18 at 21:04

Melissa Key

4,476
12
21

1

(after posting this, I saw the caveats you're after - I'm working on a modification now) – Melissa Key Apr 09 '18 at 21:08
Mellisa, thanks. If you can find a solution with respect to caveats it will be great help to me. – blazej Apr 10 '18 at 06:38
That update is done. It's not really a programming question at this point - I came up with 96 combinations which should meet your requirements. I do not know if there are others - I was systematic, but this isn't my area. – Melissa Key Apr 10 '18 at 06:47
Oh, sorry. I looked at the answer from my mobile. Will inspect it closer when I arrive to my office. Thank you – blazej Apr 10 '18 at 06:48
Apologies, but I'm having some troubles in replicating the logic of what you described. Any chance you could make it more explicit? If you can, have a look at the edited part of my question – blazej Apr 10 '18 at 17:17

Maurits Evers · Answer 2 · 2018-04-09T23:26:17.283

1

Note: I am unsure about what you're after. My answer does not seem to address your question, however I will leave it up because of the reference in @MelissaKey's answer.

Define a shift function that circularly shifts entries of a numeric vector by n to the left
```
shift <- function(x, n = 1) {
  if (n == 0) x else c(tail(x, -n), head(x, n))
}
```

If we now start from an initial vector v which corresponds to the first row of your expected output data.frame

v <- c("A", "B", "C", "D", "E", "F")

we can reproduce your expected output by rbinding successively shifted versions of v

do.call(rbind, lapply(0:(length(v) - 1), function(i) shift(v, i)))
#    [,1] [,2] [,3] [,4] [,5] [,6]
#[1,] "A"  "B"  "C"  "D"  "E"  "F"
#[2,] "B"  "C"  "D"  "E"  "F"  "A"
#[3,] "C"  "D"  "E"  "F"  "A"  "B"
#[4,] "D"  "E"  "F"  "A"  "B"  "C"
#[5,] "E"  "F"  "A"  "B"  "C"  "D"
#[6,] "F"  "A"  "B"  "C"  "D"  "E"

This will work for any initial vector of any length k, producing a final matrix of dimension k x k.

edited Apr 09 '18 at 23:26

answered Apr 09 '18 at 21:56

Maurits Evers

49,617
4
47
68

it's not OP's expected output – moodymudskipper Apr 09 '18 at 21:58
@Moody_Mudskipper Hmm. Isn't the `data.frame` the output OP wants to reconstruct? – Maurits Evers Apr 09 '18 at 21:59
1

not as i understand it, OP gives this df as a starting point and explains afterwards what's missing in it. – moodymudskipper Apr 09 '18 at 22:00
As pointed out by Moody_Mudskipper I'd like to grow this example with subsequent rows. Thank you for you time though – blazej Apr 10 '18 at 06:37

Add rows with possible combinations in R dataframe

2 Answers2