1

Given is vector:

vec <- c(LETTERS[1:10])

I would like to be able to combine it in a following manner:

resA <- c("AB", "CD", "EF", "GH", "IJ")
resB <- c("ABCDEF","GHIJ")

where elements of the vector vec are merged together according to the desired size of a new element constituting the resulting vector. This is 2 in case of resA and 5 in case of resB.

Desired solution characteristics

  • The solution should allow for flexibility with respect to the element sizes, i.e. I may want to have vectors with elements of size 2 or 20
  • There may be not enough elements in the vector to match the desired chunk size, in that case last element should be shortened accordingly (as shown)
  • This is shouldn't make a difference but the solution should work on words as well

Attempts

Initially, I was thinking of using something on the lines:

c(
  paste0(vec[1:2], collapse = ""),
  paste0(vec[3:4], collapse = ""),
  paste0(vec[5:6], collapse = "")
  # ...
)

but this would have to be adapted to jump through the remaining pairs/bigger groups of the vec and handle last group which often would be of a smaller size.

Konrad
  • 17,740
  • 16
  • 106
  • 167

3 Answers3

3

Here is what I came up with. Using Harlan's idea in this question, you can split the vector in different number of chunks. You also want to use your paste0() idea in lapply() here. Finally, you unlist a list.

unlist(lapply(split(vec, ceiling(seq_along(vec)/2)), function(x){paste0(x, collapse = "")}))

#   1    2    3    4    5 
#"AB" "CD" "EF" "GH" "IJ" 

unlist(lapply(split(vec, ceiling(seq_along(vec)/5)), function(x){paste0(x, collapse = "")}))

#      1       2 
#"ABCDE" "FGHIJ" 

unlist(lapply(split(vec, ceiling(seq_along(vec)/3)), function(x){paste0(x, collapse = "")}))

#    1     2     3     4 
#"ABC" "DEF" "GHI"   "J" 
Community
  • 1
  • 1
jazzurro
  • 23,179
  • 35
  • 66
  • 76
2
vec <- c(LETTERS[1:10])

f1 <- function(x, n){
  f <- function(x) paste0(x, collapse = '')
  regmatches(f(x), gregexpr(f(rep('.', n)), f(x)))[[1]]
}

f1(vec, 2)
# [1] "AB" "CD" "EF" "GH" "IJ"

or

f2 <- function(x, n)
  apply(matrix(x, nrow = n), 2, paste0, collapse = '')

f2(vec, 5)
# [1] "ABCDE" "FGHIJ"

or

f3 <- function(x, n) {
  f <- function(x) paste0(x, collapse = '')
  strsplit(gsub(sprintf('(%s)', f(rep('.', n))), '\\1 ', f(x)), '\\s+')[[1]]
}

f3(vec, 4)
# [1] "ABCD" "EFGH" "IJ"  

I would say the last is best of these since n for the others must be a factor or you will get warnings or recycling

edit - more

f4 <- function(x, n) {
  f <- function(x) paste0(x, collapse = '')
  Vectorize(substring, USE.NAMES = FALSE)(f(x), which((seq_along(x) %% n) == 1),
                                          which((seq_along(x) %% n) == 0))
}

f4(vec, 2)
# [1] "AB" "CD" "EF" "GH" "IJ"

or

f5  <- function(x, n)
  mapply(function(x) paste0(x, collapse = ''),
         split(x, c(0, head(cumsum(rep_len(sequence(n), length(x)) %in% n), -1))),
         USE.NAMES = FALSE)

f5(vec, 4)
# [1] "ABCD" "EFGH" "IJ"  
rawr
  • 20,481
  • 4
  • 44
  • 78
2

Here is another way, working with the original array. A side note, working with words is not straightforward, since there is at least two ways to understand it: you can either keep each word separately or collapse them first an get individual characters. The next function can deal with both options.

vec <- c(LETTERS[1:10])
vec2 <- c("AB","CDE","F","GHIJ")

cuts <- function(x, n, bychar=F) {
    if (bychar) x <- unlist(strsplit(paste0(x, collapse=""), ""))
    ii <- seq_along(x)
    li <- split(ii, ceiling(ii/n))
    return(sapply(li, function(y) paste0(x[y], collapse="")))
}

cuts(vec2,2,F)
#      1       2 
# "ABCDE" "FGHIJ" 

cuts(vec2,2,T)
#    1    2    3    4    5 
# "AB" "CD" "EF" "GH" "IJ" 
Carlos Alberto
  • 598
  • 3
  • 9