0

I have a dataframe with multiple asnwers from a sort of census. I want to summ the number of people that actually lives in certain places, and to do so i need to calculate a weighted variable too - I can't just sum all the number of people that the table shows.

  ZONA   ID_DOM   FE_DOM NO_MORAD
1    1 00010001 15.41667        2
2    1 00010001 15.41667        2
3    1 00010001 15.41667        2
4    1 00010001 15.41667        2
5    1 00010001 15.41667        2
6    1 00010002 15.41667        4

Saying it again, I want to get the sum of NO_Morad by ZONA, counting only once each of the ID_DOM. All that weighted by FE_DOM.

to just count the number of ID_DOMs I used

Zona <- count(OD_2017[!duplicated(OD_2017$ID_DOM),], wt = FE_DOM, Zonas=ZONA, name = "N_domicilios")

but now i don't know how to do so. I was trying something like

Zona <- OD_2017 %>%
  group_by(ZONA) %>%
  summarise(ID_DOM = n_distinct(ID_DOM), weights(FE_DOM))

but it didnt worked out.

Any tips?

Thanks

1 Answers1

0

I see pipes in your attempts, but here is one approach using data.table.

Data:

df <- structure(list(ZONA = c(1, 1, 1, 1, 1, 1), ID_DOM = c("00010001", 
"00010001", "00010001", "00010001", "00010001", "00010002"), FE_DOM = c(15.41667, 15.41667, 
15.41667, 15.41667, 15.41667, 15.41667), NO_MORAD = c(2, 2, 2, 
2, 2, 4)), class = "data.frame", row.names = c(NA, -6L))

Code:

library(data.table)
dt <- as.data.table(df)
dt[,unique(.SD)[,.(WeightedSum = sum(FE_DOM * NO_MORAD))],by="ZONA"]

Output:

   ZONA WeightedSum
1:    1    92.50002
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
  • Nice, Thanks! Any way, would u know how to do that using pipes? I guess data table is a bit slower, am i right? – Aquiles Silva Mar 24 '20 at 19:50
  • I use data.table because it is faster in many of my use cases on very large datasets. See [https://stackoverflow.com/questions/21435339/](https://stackoverflow.com/questions/21435339/) for how others view the pros and cons of each. – Ian Campbell Mar 24 '20 at 19:54