Pardon my less than perfect title but having some issues grasping this.
So here's the manually created data. There are three fields; state, codetype, and code. The reason for this is that I am trying to join a more expansive version of this to a data frame consisting of 1.6 million rows and running into an issue of not having enough memory. My thought process is that I would greatly lower the number of rows in this table; industry.
state <- c(32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32,32)
codetype <- c(10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10)
code <- c(522,523,524,532,533,534,544,545,546,551,552,552,561,562,563,571,572,573,574)
industry = data.frame(state,codetype,code)
The desired result would be a two fold operation. First, I would shorten down the six digit codes to 2. That is done via.
industry<-industry %>% mutate(twodigit = substr(code,1,2).
This would produce a fifth column, twodigit. At present, there are 19 values. But only 7 unique values of twodigit; 52,53,54,55,56,57. How would tell it remove all nonunique values of the two digit?