Combining elements in a string vector with defined element size and accounting for not event sizes

Question

Given is vector:

vec <- c(LETTERS[1:10])

I would like to be able to combine it in a following manner:

resA <- c("AB", "CD", "EF", "GH", "IJ")
resB <- c("ABCDEF","GHIJ")

where elements of the vector vec are merged together according to the desired size of a new element constituting the resulting vector. This is 2 in case of resA and 5 in case of resB.

Desired solution characteristics

The solution should allow for flexibility with respect to the element sizes, i.e. I may want to have vectors with elements of size 2 or 20
There may be not enough elements in the vector to match the desired chunk size, in that case last element should be shortened accordingly (as shown)
This is shouldn't make a difference but the solution should work on words as well

Attempts

Initially, I was thinking of using something on the lines:

c(
  paste0(vec[1:2], collapse = ""),
  paste0(vec[3:4], collapse = ""),
  paste0(vec[5:6], collapse = "")
  # ...
)

but this would have to be adapted to jump through the remaining pairs/bigger groups of the vec and handle last group which often would be of a smaller size.

You may find [this question](http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r) helpful. — jazzurro, Jan 24 '16 at 03:24
@jazzurro I'll have a look, what I need is that but other round, it seems. — Konrad, Jan 24 '16 at 03:34

score 3 · Answer 1 · edited May 23 '17 at 11:58

Here is what I came up with. Using Harlan's idea in this question, you can split the vector in different number of chunks. You also want to use your paste0() idea in lapply() here. Finally, you unlist a list.

unlist(lapply(split(vec, ceiling(seq_along(vec)/2)), function(x){paste0(x, collapse = "")}))

#   1    2    3    4    5 
#"AB" "CD" "EF" "GH" "IJ" 

unlist(lapply(split(vec, ceiling(seq_along(vec)/5)), function(x){paste0(x, collapse = "")}))

#      1       2 
#"ABCDE" "FGHIJ" 

unlist(lapply(split(vec, ceiling(seq_along(vec)/3)), function(x){paste0(x, collapse = "")}))

#    1     2     3     4 
#"ABC" "DEF" "GHI"   "J"

rawr · Accepted Answer · 2016-01-24T03:50:23.860

vec <- c(LETTERS[1:10])

f1 <- function(x, n){
  f <- function(x) paste0(x, collapse = '')
  regmatches(f(x), gregexpr(f(rep('.', n)), f(x)))[[1]]
}

f1(vec, 2)
# [1] "AB" "CD" "EF" "GH" "IJ"

or

f2 <- function(x, n)
  apply(matrix(x, nrow = n), 2, paste0, collapse = '')

f2(vec, 5)
# [1] "ABCDE" "FGHIJ"

or

f3 <- function(x, n) {
  f <- function(x) paste0(x, collapse = '')
  strsplit(gsub(sprintf('(%s)', f(rep('.', n))), '\\1 ', f(x)), '\\s+')[[1]]
}

f3(vec, 4)
# [1] "ABCD" "EFGH" "IJ"

I would say the last is best of these since n for the others must be a factor or you will get warnings or recycling

edit - more

f4 <- function(x, n) {
  f <- function(x) paste0(x, collapse = '')
  Vectorize(substring, USE.NAMES = FALSE)(f(x), which((seq_along(x) %% n) == 1),
                                          which((seq_along(x) %% n) == 0))
}

f4(vec, 2)
# [1] "AB" "CD" "EF" "GH" "IJ"

or

f5  <- function(x, n)
  mapply(function(x) paste0(x, collapse = ''),
         split(x, c(0, head(cumsum(rep_len(sequence(n), length(x)) %in% n), -1))),
         USE.NAMES = FALSE)

f5(vec, 4)
# [1] "ABCD" "EFGH" "IJ"

score 2 · Answer 3 · answered Jan 24 '16 at 04:26

Here is another way, working with the original array. A side note, working with words is not straightforward, since there is at least two ways to understand it: you can either keep each word separately or collapse them first an get individual characters. The next function can deal with both options.

vec <- c(LETTERS[1:10])
vec2 <- c("AB","CDE","F","GHIJ")

cuts <- function(x, n, bychar=F) {
    if (bychar) x <- unlist(strsplit(paste0(x, collapse=""), ""))
    ii <- seq_along(x)
    li <- split(ii, ceiling(ii/n))
    return(sapply(li, function(y) paste0(x[y], collapse="")))
}

cuts(vec2,2,F)
#      1       2 
# "ABCDE" "FGHIJ" 

cuts(vec2,2,T)
#    1    2    3    4    5 
# "AB" "CD" "EF" "GH" "IJ"

Combining elements in a string vector with defined element size and accounting for not event sizes

Desired solution characteristics

Attempts

3 Answers3