0

I am trying to count the letters in the list by skipping 1 letter and grouping them in three until i find "t a c" in the data frame and then i want to group the rest of them in three by skipping 3 letters until i find "a t t"

example of what i am trying to say:

"agttacgtaattatgat"

it should do:

agt,gtt,tta,tac  stop, gta,att  stop ,atg,tga,gat

(data frame's name is agen)

my code for that:

 y=c() 
x=1 
while(x<853){ 
  x=x+1
 rt<-paste(agen[x],agen[x+1],agen[x+2])
  y=c(y,rt)
  ff<-data.frame(y)
  if(ff=="t a c"){break}
}

ay=c()
while(x<853){                            
  x=x+3
  art<-paste(agen[x],agen[x+1],agen[x+2])
  ay=c(ay,art)
  aff<-data.frame(ay)
  if(aff=="a t t"){break}
}

the first one is working fine but the second one does not break.

there will be a lot of stops and starts in the code, so can you help me write a loop that can do the job?

zx8754
  • 52,746
  • 12
  • 114
  • 209
oks26
  • 1
  • 1
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Your second loop uses `ff` but that variable never changes in the loop; did you mean `aff`? Though it seems odd in both cases to compare a data.frame to a string like that. – MrFlick May 06 '19 at 16:18
  • I had wrote that wrong, edited my code now.I meant aff – oks26 May 06 '19 at 16:23

1 Answers1

0

I guess I know just roughly what you need, but here is a code example, that maybe does what you need. I used the example you specified and used a vector with your DNA bases as elements instead of a 'data frame'. I also changed some style things.

agen_string <- "agttacgtaattatgat"
# Is not a data frame, but a vector. I don't know, why you try to use a data frame.
agen <- strsplit(agen_string, split = "")[[1]]

y <- c()
x <- 0 # Start with 0. Otherwise, you wouldn't find 'tac' in the beginning
# Search for 'tac' triplett
while(x < length(agen)){
  x <- x + 1
  rt <- paste(agen[x], agen[x+1], agen[x+2], sep = "")
  print(rt)
  y <- c(y, rt)
  #ff <- data.frame(y)
  if(rt == "tac"){
    print("stop")
    break
  }
}

ay <- c()
while(x < length(agen)) {                            
  x <- x + 3
  art <- paste(agen[x], agen[x+1], agen[x+2], sep = "")
  print(art)
  ay = c(ay,art)
  #aff<-data.frame(ay)
  if(art == "att"){
    print("stop")
    break
  }
}

If you work more on DNA sequences, you may want to use a more specialized R-package, like Biostrings for example.

Fex
  • 322
  • 1
  • 13