seq="GAGTAGGAGGAG",how to split this sequence into the following sub sequence "GAG","TAG","GAG","GAG"i.e how to split the sequence in groups of threes
Asked
Active
Viewed 923 times
-1
-
2You asked the same question yesterday `strsplit("GAGTAGGAGGAG", "(?<=.{3})", perl=TRUE)` – Pierre L Jul 20 '16 at 14:10
-
can you please explain am not getting it properly – shrinirajesh Jul 20 '16 at 14:11
-
1This also works: `library(gsubfn); strapplyc(xx, "...")[[1]]` where there are three dots in a row. – G. Grothendieck Jul 20 '16 at 14:15
-
I am getting the output if I give the sequence directly but if I give sequence =readDNAStringSet("a.fasta") and then give strsplit(sequence,"(?<=.{3})", perl=TRUE)) am getting error @Pierre Lafortune – shrinirajesh Jul 20 '16 at 14:16
-
We do not know what the `readDNAStringSet("a.fasta")` output is. How do you expect us to help with it? – Pierre L Jul 20 '16 at 14:21
-
take any fasta file and then assign it to a variable using readDNAStringSet and then try strsplit – shrinirajesh Jul 20 '16 at 14:25
-
I do not have fasta files. Add a small example of the output to the question in the form `dput(head(readDNAStringSet("a.fasta")))` – Pierre L Jul 20 '16 at 14:27
-
What do you get when you enter `str(readDNAStringSet("a.fasta"))`? Add it to your question – Pierre L Jul 20 '16 at 15:19
1 Answers
1
We can create a function called fixed_split
that will split a character string into equal parts. The regular expression is a lookbehind that matches on n
elements together:
fixed_split <- function(text, n) {
strsplit(text, paste0("(?<=.{",n,"})"), perl=TRUE)
}
fixed_split("GAGTAGGAGGAG", 3)
[[1]]
[1] "GAG" "TAG" "GAG" "GAG"
Edit
In your comment you say sequence ="ATGATGATG"
does not work:
strsplit(sequence,"(?<=.{3})", perl=TRUE)
[[1]]
[1] "ATG" "ATG" "ATG"

Pierre L
- 28,203
- 6
- 47
- 69