3

Pardon my less than perfect title but having some issues grasping this.

So here's the manually created data. There are three fields; state, codetype, and code. The reason for this is that I am trying to join a more expansive version of this to a data frame consisting of 1.6 million rows and running into an issue of not having enough memory. My thought process is that I would greatly lower the number of rows in this table; industry.

state <- c(32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32)
codetype <- c(10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10)
code <- c(522,523,524,532,533,534,544,545,546,551,552,552,561,562,563,571,572,573,574)



industry = data.frame(state,codetype,code)

The desired result would be a two fold operation. First, I would shorten down the six digit codes to 2. That is done via.

industry<-industry %>% mutate(twodigit = substr(code,1,2). 

This would produce a fifth column, twodigit. At present, there are 19 values. But only 7 unique values of twodigit; 52,53,54,55,56,57. How would tell it remove all nonunique values of the two digit?

Tim Wilcox
  • 1,275
  • 2
  • 19
  • 43

2 Answers2

2

We can use distinct and specify the .keep_all as TRUE to get the entire columns

library(dplyr)
industry %>%
   distinct(twodigit, .keep_all = TRUE)

Another option would be to use duplicated in filter

industry %>%
    filter(!duplicated(twodigit))

To make this more efficient, perhaps use data.table approaches

library(data.table)
setDT(industry)[!duplicated(substr(code, 1, 2))]
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Usingunique() approach:

library(tidyverse)

state <- c(32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32)
codetype <- c(10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10)
code <- c(522,523,524,532,533,534,544,545,546,551,552,552,561,562,563,571,572,573,574)
industry = data.frame(state,codetype,code)
industry<-industry %>% mutate(twodigit = substr(code,1,2))


unique(industry$twodigit) %>%
    map_dfr(~filter(industry, twodigit == .x)[1, ])
#>   state codetype code twodigit
#> 1    32       10  522       52
#> 2    32       10  532       53
#> 3    32       10  544       54
#> 4    32       10  551       55
#> 5    32       10  561       56
#> 6    32       10  571       57

Created on 2021-06-10 by the reprex package (v2.0.0)

jpdugo17
  • 6,816
  • 2
  • 11
  • 23