-1

I have a data set of Zip code and house code.

 df = data.frame(zip = c(2900,2900,2900,3200,3100,3200),
                 house_code = c('abc','cde','efg','ghi','ijk','klm'))

I need to find top 2 zip code in terms of number of house_code?

codekiller
  • 53
  • 1
  • 9

2 Answers2

0

I think it could be: head(df[df$house_code == 'some value']$zip,2) where 'some value' is a house_code entry.

Josseline Perdomo
  • 363
  • 1
  • 4
  • 15
0

First use table to match house_codes and zip_codes.

> dftable <- table(df)

      house_code
zip    abc cde efg ghi ijk klm
  2900   1   1   1   0   0   0
  3100   0   0   0   0   1   0
  3200   0   0   0   1   0   1

Then use rowSums to find the number of house_codes for each zip_code.

> numHouse <- rowSums(dftable)

2900 3100 3200 
   3    1    2 

Finally use order to find the top 2.

> names(numHouse)[order(numHouse, decreasing = TRUE)[1:2]]

[1] "2900" "3200"
Barker
  • 2,074
  • 2
  • 17
  • 31