I am stuck with a probably simple question - how to sum consecutive duplicate rows and remove all but first row. And, if there is a NA
in between two duplicates (such as 2,na,2
) , also sum them and remove all but the first entry.
So far so good, here is my sample data
ia<-c(1,1,2,NA,2,1,1,1,1,2,1,2)
time<-c(4.5,2.4,3.6,1.5,1.2,4.9,6.4,4.4, 4.7, 7.3,2.3, 4.3)
a<-as.data.frame(cbind(ia, time))
sample output
a
ia time
1 1 4.5
2 1 2.4
3 2 3.6
4 NA 1.5
5 2 1.2
6 1 4.9
7 1 6.4
8 1 4.4
9 1 4.7
10 2 7.3
11 1 2.3
12 2 4.3
Now I want to
1.) sum the "time" column of consecutive ia's - i.e., sum the time if the number 1 occurs twice or more right after each other, in my case here sum first and second row of column time to 4.5+2.4
.
2.) if there is a NA
in between two numbers (ia column) which are the same (i.e., ia = 2, NA, 2)
, then also sum all of those times.
3.) keep only first occurence of the ia
, and delete the rest.
In the end, I would want to have something like this:
a
ia time
1 1 6.9
3 2 6.3
6 1 20.4
10 2 7.3
11 1 2.3
12 2 4.3
I found this for summing, but it does not take into account the consecutive factor
aggregate(time~ia,data=a,FUN=sum)
and I found this for deleting
a[cumsum(rle(as.numeric(a[,1]))$lengths),]
although the rle approach keeps the last entry, and I would want to keep the first. I also have no idea how to handle the NAs
.
if I have a pattern of 1-NA-2
then the NA
should NOT be counted with either of them, in this case the NA
row should be removed.