3

Suppose, I have a df$col with length eight:

1
2
3
4
5
6
7
8  

I want to divide this col into three different parts with as many "possibilities" as possible. Like this:

1  2  345678 

1 23456  78 

1 234567 8

123 45 678  

123456 7 8

so on... 

Can someone suggest a simple solution in r? Thanks

user3698773
  • 929
  • 2
  • 8
  • 15
  • Your last sample is only two groups while all others are three. Are you looking for all combinations ranging from 1 group (all eight) to 8 groups (individuals)? Also, does `df$col` have spaces (as in your sample, `"1 2 3 4 5 6 7 8`") or is it just `[0-9]{8}`? – r2evans Mar 27 '18 at 21:29
  • Thanks for the correction, I have edited the question. its just [0-9]{8}. I am looking to divide this col into all possible groups of three. – user3698773 Mar 27 '18 at 21:32
  • This is very closely related to [Stirling Numbers of the Second Kind](https://en.wikipedia.org/wiki/Stirling_numbers_of_the_second_kind). – Joseph Wood Mar 27 '18 at 22:11
  • `partitions::compositions(8, 3, include.zero = FALSE)` – Henrik Mar 27 '18 at 23:21
  • @Henrik, the `partitions` package is for additive integer partitions, so the code you posted is finding all possibilities of adding 3 numbers together that add up to 8. That is a different problem than the one posed here. You will note the very first column is `6 1 1` which doesn't make sense in this context. – Joseph Wood Mar 28 '18 at 01:22
  • @JosephWood I'm sure you know that the partitions are not the end result here (no need for mansplaining ;) ), but just a convenient way to create the lengths of the substrings (if you want to create an output as in Moody's answers), or to create a grouping variable (if you want split as Max Ft did). But OP hasn't managed to clearly describe the ultimate, desired result yet (see comments below). Cheers – Henrik Mar 28 '18 at 06:00
  • @Henrik, many apologies. I hadn't read the comments below when I wrote that. – Joseph Wood Mar 28 '18 at 11:59
  • 1
    @JosephWood No problem! :) I _could_ have fleshed out my comment to a complete answer - how to use the partitions to create substrings or a grouping variable - but I'm reluctant to answer when the desired output is so unclear. – Henrik Mar 28 '18 at 12:26

2 Answers2

5

I generalized the case a bit:

vec <- letters[1:8]    
n <- 2
combn(length(vec)-1,n,function(x){
  for(i in rev(x)) vec <- append(vec," ",i)
  paste0(vec,collapse="")})
# [1] "a b cdefgh" "a bc defgh" "a bcd efgh" "a bcde fgh" "a bcdef gh" "a bcdefg h" "ab c defgh" "ab cd efgh" "ab cde fgh" "ab cdef gh"
# [11] "ab cdefg h" "abc d efgh" "abc de fgh" "abc def gh" "abc defg h" "abcd e fgh" "abcd ef gh" "abcd efg h" "abcde f gh" "abcde fg h"
# [21] "abcdef g h"

The idea is that you have 7 places where it's possible to cut, so we sample among them using combn. It gives a nice matrix that we can apply on on the fly through the FUN argument of combn to form our concatenated strings.

I used a good old for loop in the end to generalize the n parameter but we could do it with a recursive function as well.

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
  • Thanks, right now it paste() and print everything together but how can I retrieve individual groups to work with? for example, "a b cdefgh"; I want to consider "a" as a separate group, so as "b" and "cdefgh" to run aov(). suggestions? – user3698773 Mar 27 '18 at 22:48
  • The aim is, after partitioning the df$col into a group of 3 with all possible combinations, to take each value out of these combinations and treat them as an input group to perform aov(); does it make any sense? "a b cdefgh" like "a" would be one group, "b" would be second, and "cdefgh" would be third. – user3698773 Mar 27 '18 at 23:00
  • You could use `strsplit` on my output for sure, but i'll see if i can get it more directly. You'd like a 3 col data.frame in the end ? – moodymudskipper Mar 27 '18 at 23:08
  • Thanks, in the end I need a combination of three different groups of df $col (as you already solved the problem) to run aov() test on them directly as aov(df$b ~ c("a", "b", "cdefgh")) and printing pvalue. Simple. – user3698773 Mar 27 '18 at 23:17
  • i'm not sure exactly what output you want, here' something quick and dirty `as.data.frame(t(unname(as.data.frame(strsplit(combn(length(vec)-1,n,function(x){ for(i in rev(x)) vec <- append(vec," ",i) paste0(vec,collapse="")})," ")))))`. If it's still not OK maybe you should ask a new question – moodymudskipper Mar 27 '18 at 23:33
  • Thanks, I'm running my code but got the following error: "" Error in combn(length(vec) - 1, 2, function(x) paste0(append(append(vec, : n < m "" any suggestions? – user3698773 Mar 28 '18 at 20:28
3

I like this question. Your problem comes down to picking all ordered combination of 3 integers between 1 and 10. Those combination gives you where to split your original vector.

You just need to write a function to split a vector based of a position vector. And they apply this function to all possible position vectors.

x=1:5
n.group=3
splitAt <- function(x, pos) unname(split(x, cumsum(seq_along(x) %in% pos)))
apply(combn(length(x),n.group),2,function(pos) splitAt(x,pos))

The output is a list

[[1]]
[[1]][[1]]
[1] 1

[[1]][[2]]
[1] 2

[[1]][[3]]
[1] 3 4 5


[[2]]
[[2]][[1]]
[1] 1

[[2]][[2]]
[1] 2 3

[[2]][[3]]
[1] 4 5

...

[[10]]
[[10]][[1]]
[1] 1 2

[[10]][[2]]
[1] 3

[[10]][[3]]
[1] 4

[[10]][[4]]
[1] 5
Frostic
  • 680
  • 4
  • 11
  • I like it - `combn` also has a `FUN=` argument that you could also take advantage of - `combn(length(x), n.group, FUN=function(pos) splitAt(x,pos), simplify=FALSE)`. Although I note that this sometimes ends up with 4 groups instead of 3. – thelatemail Mar 27 '18 at 21:52
  • I did not know that. Neat – Frostic Mar 27 '18 at 21:54