Input : kdff455556tfkkkw
Output1 : kdf[2]45[4]6tfk[3]w
Hint: Repeated "alphnum" by the alphnum[No_Of_Repetition]
Input : kdff455556tfkkkw
Output1 : kdf[2]45[4]6tfk[3]w
Hint: Repeated "alphnum" by the alphnum[No_Of_Repetition]
s <- c('kdff455556tfkkkw','abc','abbccc');
sapply(strsplit(s,''),function(x) paste(with(rle(x),ifelse(lengths==1,values,paste0(values,'[',lengths,']'))),collapse=''));
## [1] "kdf[2]45[4]6tfk[3]w" "abc" "ab[2]c[3]"
Here's a roundabout regex approach:
require(stringr)
p <- "(([a-z0-9])\\2+)"
x <- str_split(s,p)
r <- sapply(str_extract_all(s,p),
function(x)if (length(x))paste0(substr(x,1,1),"[",nchar(x),"]")else "")
mapply(function(x,r)paste0(
c(x,r)[order(c(seq_along(x),seq_along(r)))]
,collapse=""),x,r)
# output for @bgoldst's input:
# [1] "kdf[2]45[4]6tfk[3]w" "abc" "ab[2]c[3]"
The last step is an interleaving trick copied from @Arun.
I wish my initial guess had worked:
sapply(s,function(x)gsub(p,paste0("\\2[",nchar("\\1"),"]"),x))
But it seems that "\\1"
is is not evaluated inside gsub
, and somehow always has two characters:
# kdff455556tfkkkw abc abbccc
# "kdf[2]45[2]6tfk[2]w" "abc" "ab[2]c[2]"