I have a data set of Zip code and house code.
df = data.frame(zip = c(2900,2900,2900,3200,3100,3200),
house_code = c('abc','cde','efg','ghi','ijk','klm'))
I need to find top 2 zip code in terms of number of house_code
?
I have a data set of Zip code and house code.
df = data.frame(zip = c(2900,2900,2900,3200,3100,3200),
house_code = c('abc','cde','efg','ghi','ijk','klm'))
I need to find top 2 zip code in terms of number of house_code
?
I think it could be: head(df[df$house_code == 'some value']$zip,2)
where 'some value' is a house_code entry.
First use table to match house_code
s and zip_code
s.
> dftable <- table(df)
house_code
zip abc cde efg ghi ijk klm
2900 1 1 1 0 0 0
3100 0 0 0 0 1 0
3200 0 0 0 1 0 1
Then use rowSums
to find the number of house_code
s for each zip_code
.
> numHouse <- rowSums(dftable)
2900 3100 3200
3 1 2
Finally use order to find the top 2.
> names(numHouse)[order(numHouse, decreasing = TRUE)[1:2]]
[1] "2900" "3200"