I'm trying to find if there is any conditional dependence within 2 different DNA sequences in R
This is my code, however i'm getting an error;
Error in `[.data.frame`(data, i) : undefined columns selected
I'm not sure where the issue is, if I parentheses the data[i-1]==bases[b2]
, i just get multiple unexpected}
, which is the only thing I can think else to do.
for (b1 in 1:length(bases))
{
for (b2 in 1:length(bases))
{
count = 1
for (i in 2:length(mydata1))
{
if ((mydata1[i]==bases[b1]) & mydata1[i-1]==bases[b2])
{
count = count+1
}
}
b3 = c(bases[b1], bases[b2], count)
print(b3)
}
}
_I'm expecting essentially a list of certain DNA bases, for example I see it as if the DNA sequence IS conditional upon the previous base then;.
[1] "A" "C" "002"
[1] "A" "C" "005"
[1] "A" "C" "009"
and so on, that can show me any indication as to whether a certain base has any sort of affect upon the identity of the following base, by clearly showing a condition for A to be previous to C.
Ok so essentially the mydata1
(there is also mydata2
) are DNA sequences, that is to say a list of "A", "G", "C" and "T"
, each of which is 10,000 bases long.
As shown here;
V1
1 T
2 C
3 G
4 G
5 T
6 G
7 G
8 G
9 C
10 A
I'm tasked with trying to determine if the sequence has bases that are dependent on one another, so if [1] T
affects the presence of [2] C
, etc. One of the sequences is dependent, the other is not.