Convert factor above certain value to NA

Question

I have a data.frame with 2.5 million obs. of 32 variables, all factors. One variable consists numbers between 0 and 999. I want to convert all the numbers above 99 to NA because the model only accepts numbers with 2 digits.

Thanks,

Tim

Welcome to SO. Please [How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Also you should show [what have you tried](http://mattgemmell.com/2008/12/08/what-have-you-tried/). — zero323, Nov 04 '13 at 08:33
Excuse me. I have tried to set the values larger than 99 to NA with the following formula: dataframe[dataframe$postcode > 99] <- NA. Then it gives error: Error in `[<-.data.frame`(`*tmp*`, dataframe$postcode > 99, value = NA) : missing values are not allowed in subscripted assignments of data frames — Tim_Utrecht, Nov 04 '13 at 08:39
Thanks, but I think what @zero323, and me too, wants is that you add that to the question along with a small part of you data frame ([created by `dput`](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for instance) that we can test our solutions on. — Backlin, Nov 04 '13 at 08:42
`set.seed(10); dataframe <- data.frame(postcode=as.factor(round(runif(10, 1,200)))); dataframe[as.numeric(levels(dataframe$postcode)[dataframe$postcode]) > 99, 'postcode'] <- NA` — zero323, Nov 04 '13 at 08:45
@James Correct me if I'm wrong but I think that `dataframe$postcode > 99` won't work for factors. — zero323, Nov 04 '13 at 08:51
@zero323 That's true, in that case use `nchar(as.character(dataframe$postcode))>2` — James, Nov 04 '13 at 09:19
Thanks for you comments. @zero323: The formula does not work for me. and @James Hard: It does indeed not work for factors, when I use your adjustment: dataframe[nchar(as.character(dataframe$postcode))>2] <- NA. gives the following error: Error in `[<-.data.frame`(`*tmp*`, nchar(as.character(data.read$PropertyPostcode)) > : duplicate subscripts for columns. — Tim_Utrecht, Nov 04 '13 at 09:52
After head(dput) I get fhe following:structure(c(3L, 46L, 66L, 2L, 59L, 30L), .Label = c("10", "11".........) — Tim_Utrecht, Nov 04 '13 at 09:57
@Tim, you need to use two arguments to `[` as in my original comment, otherwise you will only extract complete columns — James, Nov 04 '13 at 10:06

Tay Shin · Accepted Answer · 2013-11-05T07:50:23.963

1

######making example data set######
ex=matrix(as.factor(rnorm(6,100,10)),3,2)

ex

#           [,1]      [,2]
# [1,] 113.29893 101.54136
# [2,]  91.55164 101.45872
# [3,] 101.14473  88.19593

ex2=data.frame(ex)
###### solution ######    
ex3=apply(ex2,2,as.numeric)

ex3[ex3>99]=NA

ex3
#         X1       X2
# 1       NA       NA
# 2 91.55164       NA
# 3       NA 88.19593

edited Nov 05 '13 at 07:50

answered Nov 05 '13 at 07:39

Tay Shin

528
4
17

Convert factor above certain value to NA

1 Answers1