0

I have previously asked for a method to split a string each 3 characters and save the results in a dataframe. Now I want to do the same thing but instead in a sliding window of size n.

This question differs from the marked duplicate one as the results here should be outputed in a dataframe. The mapply function given would require quite some extra work to combine it in a new dataframe and to add the positions as column names as explained at the top of my previous question .

Example data

df <- data.frame(id = 1:2, seq = c('ABCDEF', 'XYZZZY'))

Looks like this:

  id    seq
1  1 ABCDEF
2  2 XYZZZY

Splitting on every third character with a window size of n = 1

id  1   2   3   4
1   ABC BCD CDE DEF
2   XYZ YZZ ZZZ ZZY

I tried to do this using the seperate function as answered on my previous post however as far as I can find this can only split on fixed split points rather than on a range.


CodeNoob
  • 1,988
  • 1
  • 11
  • 33
  • This partially does the job. Maybe you can tweak it with `rapply`: `lapply(seq_along(df$seq),function(x) substring(df$seq,x,x+2))`. – NelsonGon Jun 03 '19 at 13:10
  • 1
    `library(zoo); with(df, data.frame(id, t(sapply(strsplit(as.character(seq), ""), rollapplyr, 3, paste, collapse = "")), check.names = FALSE, stringsAsFactors = FALSE))` – G. Grothendieck Jun 03 '19 at 13:19
  • That worked fine! thankyou @G.Grothendieck – CodeNoob Jun 03 '19 at 14:10
  • ```cbind(df , t(data.frame(lapply(df[,2] , function(dfword){mapply(function(x, y){substr(dfword, x, y)}, x=1:(nc-2), y=3:nc)}),row.names = NULL)))``` – M-- Jun 03 '19 at 19:26

0 Answers0