4

I have a string "ab" and I want to generate all possible combination of dots between a, b. For example,

<code>enter image description here</code>

In this case there could be maximum 3 dots (no two consecutive dots) and minimum dots is 0. "ab" is just a toy example and the length of the string can be as large as 30. I have no clue where to start. Any help will be highly appreciated. Thanks in advance.

Joseph Wood
  • 7,077
  • 2
  • 30
  • 65
user3642360
  • 762
  • 10
  • 23
  • 1
    This is a nice question, but you should update your question according to [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – pogibas Jan 14 '19 at 17:40
  • Thanks. I have updated the example. – user3642360 Jan 14 '19 at 17:46
  • 1
    Please read linked post and post data using `dput()` function – pogibas Jan 14 '19 at 17:47

2 Answers2

3

This is deliberately not a complete answer, but it's a very good start.

If you have n letters there are n + 1 possible positions for dots, and 2^(n + 1) as each position can either have a dot or no dot. You just need to iterate through those possibilities. We'll begin by generating these 2^(n + 1) dot patterns using expand.grid:

input = "abc"
n = nchar(input)
dots = do.call(expand.grid, rep(list(c("", ".")), n + 1))
dots
#    Var1 Var2 Var3 Var4
# 1                     
# 2     .               
# 3          .          
# 4     .    .          
# 5               .     
# 6     .         .     
# 7          .    .     
# 8     .    .    .     
# 9                    .
# 10    .              .
# 11         .         .
# 12    .    .         .
# 13              .    .
# 14    .         .    .
# 15         .    .    .
# 16    .    .    .    .

I'll leave to you finishing up - breaking up your input string into individual letters strsplit(input, ""), and using paste0 or similar to combine the letters with the dots.

You say your input can have length up to 30. The result there will have 2^31 = 2,147,483,648 combinations, which is quite a few. You may run into memory constraints doing this in R, depending on your machine. I'd consider thinking about whether you really need to generate all the combinations. A usually better approach would be to use iterators (see, e.g., the iterators package). This could help you generate any arbitrary combination that you want, without requiring you to generate every combination.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • 1
    Thanks Gregor. But I think there is one closing bracket missing in your code. I have tried with dots = do.call(expand.grid, rep(list(c("", "."), n + 1))) but getting wrong output. – user3642360 Jan 14 '19 at 17:54
2

This is similar to Stars and bars. Here, our letters are analogous to the stars and the dots are analogous to the bars:

## transform string into stars and bars format
ab

## add spaces around each letter    
_a_b_

## substitute stars (i.e. asterisks) for letters
_*_*_

Now, we simply ask the question:

Given n spaces, how many ways can we fill those spaces with 0 to n bars?

As @Gregor pointed out, this turns out to be the sum of the binomial coefficients (assuming n letters):

sum(sapply(0:(n + 1), function(x) combinat::nCm(n + 1, x))) == 2^(n + 1)

With base R we can easily achieve the desired result:

myStr <- "abcd"
indStr <- strsplit(myStr, split = "")[[1]]
strTemplate <- vector("character", length = (length(indStr) * 2 + 1))
strTemplate[seq(2, length(strTemplate), 2)] <- indStr

strTemplate
[1] ""  "a" ""  "b" ""  "c" ""  "d" "" 

dotVec <- seq(1L, length(strTemplate), 2L)

dotVec
[1] 1 3 5 7 9

unlist(lapply(1:length(dotVec), function(x) {
    combn(dotVec, x, FUN = function(y) {
        temp <- strTemplate
        temp[y] <- "."
        paste0(temp, collapse = "")
    })
}))

 [1] ".abcd"     "a.bcd"     "ab.cd"     "abc.d"     "abcd."    
 [6] ".a.bcd"    ".ab.cd"    ".abc.d"    ".abcd."    "a.b.cd"   
[11] "a.bc.d"    "a.bcd."    "ab.c.d"    "ab.cd."    "abc.d."   
[16] ".a.b.cd"   ".a.bc.d"   ".a.bcd."   ".ab.c.d"   ".ab.cd."  
[21] ".abc.d."   "a.b.c.d"   "a.b.cd."   "a.bc.d."   "ab.c.d."  
[26] ".a.b.c.d"  ".a.b.cd."  ".a.bc.d."  ".ab.c.d."  "a.b.c.d." 
[31] ".a.b.c.d."
Joseph Wood
  • 7,077
  • 2
  • 30
  • 65