Language: R
Package: data.table
I have two datasets: One where all the country names are listed (the key dataset), and one where I have some data about a population in which one field is Country of birth. And I want to make data.table of the number of people born in each country. Now there might be some countries where none of the participants were born. How can I create such a table using data.table commands?
Example:
CountryKey = data.table(Country = LETTERS[1:5])
DT = data.table(Person = c(1,2,3,4,5,6,7), Birth = LETTERS[c(1,1,2,3,2,1,3)])
DT
Person Birth
1: 1 A
2: 2 A
3: 3 B
4: 4 C
5: 5 B
6: 6 A
7: 7 C
CountryKey
Country
1: A
2: B
3: C
4: D
5: E
DT[, j = .(.N), by = .(Birth)]
Birth N
1: A 3
2: B 2
3: C 2
Since there was noone born in D, and E they do not show up in the output. However I want to see them with a value of zero.
@Frank asnwered this question.
DT[CountryKey, on=c(Birth = "Country"), .N, by=.EACHI]
Birth N
1: A 3
2: B 2
3: C 2
4: D 0
5: E 0
However, the CountryKey file might not be unique in its rows. Let's say it is just some dataset similar to DT, with many rows of the same country. How can we do it then?
CountryKey = data.table(Country =c('A','B','B','C','C','D','E','D','D'))
Country
1: A
2: B
3: B
4: C
5: C
6: D
7: E
8: D
9: D
The above solution gives this result which is not what I want. I want to see the unique entries for each country only.
DT[CountryKey, on=c(Birth = "Country"), .N, by=.EACHI]
Birth N
1: A 3
2: B 2
3: B 2
4: C 2
5: C 2
6: D 0
7: E 0
8: D 0
9: D 0
@Frank's answer partially works:
DT[unique(CountryKey), on=c(Birth = "Country"), .N, by=.EACHI]
Birth N
1: A 3
2: B 2
3: C 2
4: D 0
5: E 0
However, if the CountryKey has several columns, it does not:
CountryKey = data.table(Country =c('A','B','B','C','C','D','E','D','D'),
Somevalue = sample(x = 3, size = 9, replace = T))
DT[unique(CountryKey), on=c(Birth = "Country"), .N, by=.EACHI]
Birth N
1: A 3
2: B 2
3: B 2
4: C 2
5: C 2
6: D 0
7: E 0
8: D 0
Solution by @Frank:
DT[i = unique(CountryKey$Country), on="Birth", .N, by=.EACHI]
P.S. : You should have 1.9.7. Earlier versions will give an error.