I have a dataframe in following format:
sample_df <- structure(list(conversationid = c("C1", "C2", "C2", "C2",
"C2", "C2", "C3", "C3", "C3", "C3"),
sentby = c("Consumer","Consumer", "Agent", "Agent", "Agent", "Consumer",
"Agent", "Consumer","Agent", "Agent"),
time = c("2018-04-25 03:54:04.550+0000", "2018-05-11 19:18:05.094+0000",
"2018-05-11 19:18:09.218+0000", "2018-05-11 19:18:09.467+0000",
"2018-05-11 19:18:13.527+0000", "2018-05-14 22:57:10.004+0000",
"2018-05-14 22:57:14.330+0000", "2018-05-14 22:57:20.795+0000",
"2018-05-14 22:57:22.168+0000", "2018-05-14 22:57:24.203+0000"),
diff = c(NA, NA, 0.0687333333333333, 0.00415, 0.0676666666666667, NA, 0.0721,
0.10775, 0.0228833333333333,0.0339166666666667)),
.Names = c("conversationid", "sentby","time","diff"), row.names = c(NA, 10L),
class = "data.frame")
Where conversationid is a conversation id and can contain messages sent by either an agent or a customer. What I would like to do is, maintain a running count whenever "Agent" shows up in the conversation, like this:
Target Output:
conversationid sentby diff agent_counter_flag
C1 Consumer NA 0
C2 Consumer NA 0
C2 Agent 0.06873333 1
C2 Agent 0.00415 2
C2 Agent 0.06766667 3
C2 Consumer NA 0
C3 Agent 0.0721 1
C3 Consumer 0.10775 0
C3 Agent 0.02288333 2
C3 Agent 0.03391667 3
Currently, I am able to partition the dataframe and rank all records grouped by cid using following code:
setDT(sample_df)
sample_df[,Order := rank(time, ties.method = "first"), by = "conversationid"]
sample_df <- as.data.frame(sample_df)
But all it does is rank records within a partition disregarding if it's an "Agent" or "Customer".
Current Output:
conversationid sentby diff Order
C1 Consumer NA 1
C2 Consumer NA 1
C2 Agent 0.06873333 2
C2 Agent 0.00415 3
C2 Agent 0.06766667 4
C2 Consumer NA 5
C3 Agent 0.0721 1
C3 Consumer 0.10775 2
C3 Agent 0.02288333 3
C3 Agent 0.03391667 4
How do I proceed so I can have my dataframe as shown in target output? Thanks!