split a df$col into three groups with as many possible combination as possible in r

Question

Suppose, I have a df$col with length eight:

I want to divide this col into three different parts with as many "possibilities" as possible. Like this:

1  2  345678 

1 23456  78 

1 234567 8

123 45 678  

123456 7 8

so on...

Can someone suggest a simple solution in r? Thanks

Your last sample is only two groups while all others are three. Are you looking for all combinations ranging from 1 group (all eight) to 8 groups (individuals)? Also, does `df$col` have spaces (as in your sample, `"1 2 3 4 5 6 7 8`") or is it just `[0-9]{8}`? — r2evans, Mar 27 '18 at 21:29
Thanks for the correction, I have edited the question. its just [0-9]{8}. I am looking to divide this col into all possible groups of three. — user3698773, Mar 27 '18 at 21:32
This is very closely related to [Stirling Numbers of the Second Kind](https://en.wikipedia.org/wiki/Stirling_numbers_of_the_second_kind). — Joseph Wood, Mar 27 '18 at 22:11
@Henrik, the `partitions` package is for additive integer partitions, so the code you posted is finding all possibilities of adding 3 numbers together that add up to 8. That is a different problem than the one posed here. You will note the very first column is `6 1 1` which doesn't make sense in this context. — Joseph Wood, Mar 28 '18 at 01:22
@JosephWood I'm sure you know that the partitions are not the end result here (no need for mansplaining ;) ), but just a convenient way to create the lengths of the substrings (if you want to create an output as in Moody's answers), or to create a grouping variable (if you want split as Max Ft did). But OP hasn't managed to clearly describe the ultimate, desired result yet (see comments below). Cheers — Henrik, Mar 28 '18 at 06:00
@Henrik, many apologies. I hadn't read the comments below when I wrote that. — Joseph Wood, Mar 28 '18 at 11:59
@JosephWood No problem! :) I _could_ have fleshed out my comment to a complete answer - how to use the partitions to create substrings or a grouping variable - but I'm reluctant to answer when the desired output is so unclear. — Henrik, Mar 28 '18 at 12:26

moodymudskipper · Accepted Answer · 2018-03-27T22:16:42.053

5

I generalized the case a bit:

vec <- letters[1:8]    
n <- 2
combn(length(vec)-1,n,function(x){
  for(i in rev(x)) vec <- append(vec," ",i)
  paste0(vec,collapse="")})
# [1] "a b cdefgh" "a bc defgh" "a bcd efgh" "a bcde fgh" "a bcdef gh" "a bcdefg h" "ab c defgh" "ab cd efgh" "ab cde fgh" "ab cdef gh"
# [11] "ab cdefg h" "abc d efgh" "abc de fgh" "abc def gh" "abc defg h" "abcd e fgh" "abcd ef gh" "abcd efg h" "abcde f gh" "abcde fg h"
# [21] "abcdef g h"

The idea is that you have 7 places where it's possible to cut, so we sample among them using combn. It gives a nice matrix that we can apply on on the fly through the FUN argument of combn to form our concatenated strings.

I used a good old for loop in the end to generalize the n parameter but we could do it with a recursive function as well.

edited Mar 27 '18 at 22:16

answered Mar 27 '18 at 21:46

moodymudskipper

46,417
11
121
167

Thanks, right now it paste() and print everything together but how can I retrieve individual groups to work with? for example, "a b cdefgh"; I want to consider "a" as a separate group, so as "b" and "cdefgh" to run aov(). suggestions? – user3698773 Mar 27 '18 at 22:48
The aim is, after partitioning the df$col into a group of 3 with all possible combinations, to take each value out of these combinations and treat them as an input group to perform aov(); does it make any sense? "a b cdefgh" like "a" would be one group, "b" would be second, and "cdefgh" would be third. – user3698773 Mar 27 '18 at 23:00
You could use `strsplit` on my output for sure, but i'll see if i can get it more directly. You'd like a 3 col data.frame in the end ? – moodymudskipper Mar 27 '18 at 23:08
Thanks, in the end I need a combination of three different groups of df $col (as you already solved the problem) to run aov() test on them directly as aov(df$b ~ c("a", "b", "cdefgh")) and printing pvalue. Simple. – user3698773 Mar 27 '18 at 23:17
i'm not sure exactly what output you want, here' something quick and dirty `as.data.frame(t(unname(as.data.frame(strsplit(combn(length(vec)-1,n,function(x){ for(i in rev(x)) vec <- append(vec," ",i) paste0(vec,collapse="")})," ")))))`. If it's still not OK maybe you should ask a new question – moodymudskipper Mar 27 '18 at 23:33
Thanks, I'm running my code but got the following error: "" Error in combn(length(vec) - 1, 2, function(x) paste0(append(append(vec, : n < m "" any suggestions? – user3698773 Mar 28 '18 at 20:28

Frostic · Answer 2 · 2018-03-27T21:47:56.943

3

I like this question. Your problem comes down to picking all ordered combination of 3 integers between 1 and 10. Those combination gives you where to split your original vector.

You just need to write a function to split a vector based of a position vector. And they apply this function to all possible position vectors.

x=1:5
n.group=3
splitAt <- function(x, pos) unname(split(x, cumsum(seq_along(x) %in% pos)))
apply(combn(length(x),n.group),2,function(pos) splitAt(x,pos))

The output is a list

[[1]]
[[1]][[1]]
[1] 1

[[1]][[2]]
[1] 2

[[1]][[3]]
[1] 3 4 5


[[2]]
[[2]][[1]]
[1] 1

[[2]][[2]]
[1] 2 3

[[2]][[3]]
[1] 4 5

...

[[10]]
[[10]][[1]]
[1] 1 2

[[10]][[2]]
[1] 3

[[10]][[3]]
[1] 4

[[10]][[4]]
[1] 5

edited Mar 27 '18 at 21:47

answered Mar 27 '18 at 21:39

Frostic

680
4
11

I like it - `combn` also has a `FUN=` argument that you could also take advantage of - `combn(length(x), n.group, FUN=function(pos) splitAt(x,pos), simplify=FALSE)`. Although I note that this sometimes ends up with 4 groups instead of 3. – thelatemail Mar 27 '18 at 21:52
I did not know that. Neat – Frostic Mar 27 '18 at 21:54

split a df$col into three groups with as many possible combination as possible in r

2 Answers2