So, what I am attempting here is that, trying to count the number of sequence in a data set that goes from A immediately to C than after some time in C goes to L. I want to count the number of times this occurs and the average time it takes for this to occur in time periods, which is sectioned off by time_1, time_2,... etc.
So say in R, I have a dataframe with headings like ID
, t_1
, t_2
, t_3
,.... and each can take values A
, C
and L
. And say I have a huge amount of data, how would I be able to find the number of times that a sequence that starts with A
then immediately after that is C
, then after any amount of time (so going through the column for an individual) it will arrive at a state of L
?
What I had is that:
Lets say that the data I have is path, where it describes the path that a person with different ID number go through for each time point
My attempt of solving the problem
But this is extremely inefficient, as I need to do all the cases of all the time points, how can one achieve this in R
efficiently? Thank you! :)
For Example:
ID <- c("i_1", "i_2", "i_3", "i_4")
t_1 <- c("A","C","A","C")
t_2 <- c("C","A","C","L")
t_3 <- c("L","C","L","L")
t_4 <- c("C","L","L","L")
path <-data.frame("ID" = ID, "t_1" = t_1, "t_2"=t_2, "t_3" = t_3, "t_4" = t_4)
path
diff_path_01 <- path[path$t_1 =="A" & path$t_2 == "C" &path$t_3 == "L",]
diff_path_01
diff_path_02 <- path[path$t_1 =="A" & path$t_2 == "C" &path$t_3 == "C" & path$t_4 == "L",]
diff_path_02
diff_path_03 <- path[path$t_2 =="A" & path$t_3 == "C" &path$t_4 == "L",]
diff_path_03
row(diff_path_03)
count <- nrow(diff_path_01)+nrow(diff_path_02) +nrow(diff_path_03)
count
So the count is the output of the number of sequence from A > C > L However for the average time it takes, I am not sure how to attempt it, I know that i should be counting the element C between A and L's but dont know how to implement that
Hope someone can help, thank you!