I want to group individuals into different households on the basis of their address and their ownership of parcels. People belong to the same household if they live at the same address and if they are linked by ownership, directly or indirectly, of at least one parcel.
The links between individuals can be direct, i.e. two people have a parcel in common. But the link can also be indirect, through cross-linked forming chains - two people have a parcel in common and one of them has a parcel in common with someone else, and all live at the same address.
Here are some examples:
- if one person (9) lives alone at his address (C), he will have his household alone even if another person (6) also owns his or her parcel (s).
- If two people (12 and 13) live at the same address (F) and own the same parcel (w) then they are part of the same household. But if three people live at the same address (B) but only two people (7 and 8) own the same parcel (r) and a third person (6) who lives at this address (B) but owns another parcel (m) only the two who own the same parcel are from the same household.
- If at the same address (A), 4 persons live (1, 2, 3 and 4), if persons (1, 2 and 3) are linked by the possession of several parcels (m, n and o) then they are part of the same household whereas the person (4) who also lives at this address but who does not possess any of these 3 parcels but another one (p) is not part of the same household.
I have three variables: address id, owner id and parcel id. I would like to get a household number. Here is an example table:
id_address id_owner id_parcel id_household
A 1 m 1
A 1 n 1
A 2 n 1
A 2 o 1
A 3 o 1
A 4 p 2
A 5 q 3
B 6 s 4
B 7 r 5
B 8 r 5
C 9 s 6
D 10 t 7
E 11 u 8
E 11 v 8
F 12 w 9
F 13 w 9
My first instinct was to loop, but I have 800,000 rows and it might take too long.
Sample data where 'id_household' is the variable I want to create:
structure(list(id_address = c("A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "C", "D", "E", "E", "F", "F"), id_owner = c(1L,
1L, 2L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 11L, 12L, 13L
), id_parcel = c("m", "n", "n", "o", "o", "p", "q", "s", "r",
"r", "s", "t", "u", "v", "w", "w"), id_household = c(1L, 1L,
1L, 1L, 1L, 2L, 3L, 4L, 5L, 5L, 6L, 7L, 8L, 8L, 9L, 9L)), class = "data.frame", row.names = c(NA,
-16L))