I need to group my data for province IDs (MUN_RESID) and population (V16). My dataframe contains 8,627,071 observations. I've been trying solutions provided in this forum for days now such as this and this, but nothing works. Any help on this would be greatly appreciated. Thank you very much
This is what the sample looks like:
X MUN_RESID V16 X08.2005_P X09.2005_P X10.2005_P
1 1 110001 13203 0 0.007574036 0
2 2 110001 13203 0 0.007574036 0
3 3 110001 13203 0 0.007574036 0
4 4 110001 13203 0 0.007574036 0
5 5 110001 13203 0 0.007574036 0
6 6 110001 13203 0 0.007574036 0
7 7 110001 13203 0 0.007574036 0
8627069 8627069 530010 14802 0 0.000000000 0
8627070 8627070 530010 14802 0 0.000000000 0
8627071 8627071 530010 14802 0 0.000000000 0
==X==============================================================X==
Copy+Paste this part. (If on a Mac, it is already copied!)
==X==============================================================X==
months0606 <- structure(list(X = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8627069L, 8627070L,8627071L),
MUN_RESID = c(110001L, 110001L, 110001L,
110001L,110001L, 110001L, 110001L, 530010L, 530010L, 530010L),
V16 = c(13203L,13203L, 13203L, 13203L, 13203L, 13203L, 13203L, 14802L, 14802L,14802L),
X08.2005_P = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
X09.2005_P = c(0.00757403620389305,0.00757403620389305,
0.00757403620389305, 0.00757403620389305,0.00757403620389305,
0.00757403620389305, 0.00757403620389305,0, 0, 0),
X10.2005_P = c(0, 0, 0, 0, 0, 0, 0, 0, 0,
0)), class = "data.frame", row.names =
c(1L,2L, 3L, 4L, 5L, 6L, 7L, 8627069L, 8627070L, 8627071L))
==X==============================================================X==
I've tried
months0606_grouped <- ddply(months0606, .(V16))
(does not give me any output at all)
library(dplyr)
months0606 %>% group_by(MUN_RESID, V16)
months0606 %>% dplyr::group_by(MUN_RESID)
(does not give me any error warning, but does no grouping either. This is the output:
# A tibble: 8,627,071 x 20
# Groups: MUN_RESID [5,227]
X MUN_RESID V16 X08.2005_P X09.2005_P X10.2005_P
<int> <int> <int> <dbl> <dbl> <dbl>
1 1 110001 13203 0 0.00757 0
2 2 110001 13203 0 0.00757 0
3 3 110001 13203 0 0.00757 0
4 4 110001 13203 0 0.00757 0
5 5 110001 13203 0 0.00757 0
6 6 110001 13203 0 0.00757 0
7 7 110001 13203 0 0.00757 0
8 8 110001 13203 0 0.00757 0
9 9 110001 13203 0 0.00757 0
10 10 110001 13203 0 0.00757 0
# ... with 8,627,061 more rows, and 14 more variables:
# X11.2005_P <dbl>, X12.2005_P <dbl>,
# X01.2006_P <dbl>, X02.2006_P <dbl>,
# X03.2006_P <dbl>, X04.2006_P <dbl>,
# X05.2006_P <dbl>, X06.2006_P <dbl>,
# X07.2006_P <dbl>, X08.2006_P <dbl>,
# X09.2006_P <dbl>, X10.2006_P <dbl>,
# X11.2006_P <dbl>, X12.2006_P <dbl>
Also tried:
months0606$V16 <- with(months0606, ifelse(V16 %in% months0606, "V16"))
My goal is to have my dataframe look like this: Every combination of MUN_RESID and population level(V16) only contains one row:
MUN_RESID V16 X08.2005_P X09.2005_P
110001 13203 0 0.007507
530010 530010 0 0