I want to sum about 10000 columns like colSparseX
on 1500 sparse rows of an dataframe. If I have the input:
(I tried on OriginalDataframe this:
coldatfra <- aggregate(. ~colID,datfra,sum)
and this:
coldatfra <- ddply(datfra, .(colID), numcolwise(sum))
But it doesn't work!)
colID <- c(rep(seq(1:6),2), rep(seq(1:2),3))
colSparse1 <- c(rep(1,5), rep(0,4), rep(1,2), rep(0,5), rep(1,2))
cPlSpars2 <- c(rep(1,3), rep(0,6), rep(1,2), rep(0,5), rep(1,2))
coMSparse3 <- c(rep(1,6), rep(0,3), rep(1,2), rep(0,5), rep(1,2))
colSpArseN <- c(rep(1,2), rep(0,7), rep(1,2), rep(0,5), rep(1,2))
(datfra <- data.frame(colID, colSparse1, cPlSpars2, coMSparse3, colSpArseN))
colID colSparse1 cPlSpars2 coMSparse3 colSpArseN
1 1 1 1 1
2 1 1 1 1
3 1 1 1 0
4 1 0 1 0
5 1 0 1 0
6 0 0 1 0
1 0 0 0 0
2 0 0 0 0
3 0 0 0 0
4 1 1 1 1
5 1 1 1 1
6 0 0 0 0
1 0 0 0 0
2 0 0 0 0
1 0 0 0 0
2 0 0 0 0
1 1 1 1 1
2 1 1 1 1
And want to sum the elements for each ID on all (10000 columns - requires some placeholder for colnames are very variable words) colSparse
s in order to get this:
colID colSparse1 cPlSpars2 coMSparse3 colSpArseN
1 2 2 2 2
2 2 2 2 2
3 1 1 1 0
4 2 1 2 1
5 2 1 2 1
6 0 0 1 0
Note: str(OriginalDataframe)
'data.frame': 1500 obs. of 10000 variables:
$ someword : num 0 0 0 0 0 0 0 0 0 0 ...
$ anotherword : num 0 0 0 0 0 0 0 0 0 0 ...
And on a smaller version (which was terminated) of the OriginalDataframe treated with ddply(datfra, .(colID), numcolwise(sum))
I get:
colID colSparse1 cPlSpars2 coMSparse3 colSpArseN
1 0019 0 0 0 0
NA <NA> NA NA NA NA
NA.1 <NA> NA NA NA NA
NA.2 <NA> NA NA NA NA
NA.3 <NA> NA NA NA NA