I am combining several large datasets in R where missing values are denoted by ".". I want to do a bulk find-and-replace of "." with "NA" across the entire dataset (there are ~35 columns, and several hundred thousand rows). I've tried ifelse statements within individual columns, but the class of the column changes from factor to character in this process. When I convert back to factor the values have changed.
example data.frame:
SHARP_ID YEAR CAL_DATE JUL_DAY ST_TIME OBS_INIT NOISE
23971_p7 2012 28-Jul-12 210 837 RP_CAW 1
23971_p7 2012 2-Jun-12 154 735 RP_CAW 4
23971_p5 2012 28-Jul-12 210 855 RP_CAW 1
23971_p10 2012 28-Jun-12 180 1012 RP_CAW 3
23971_p10 2012 28-Jul-12 210 813 RP_CAW 1
23971_p2 2012 28-Jun-12 180 856 RP_CAW .
23971_p2 2012 28-Jun-12 180 856 RP_CAW 2
23971_p2 2012 28-Jul-12 210 921 RP_CAW 1
23971_p5 2012 2-Jun-12 154 753 RP_CAW .
23971_p5 2012 2-Jun-12 154 753 RP_CAW .
I have tried using ifelse, lapply, and gsub, however in all cases the mode of the column (in this example, NOISE) changes from factor to character. When I try to switch it back to factor, the values are different. for example:
> levels(d$NOISE)
[1] "." "0" "1" "2" "3" "4"
> class(d$NOISE)
[1] "factor"
> d$NOISE=ifelse(d$NOISE==".",as.factor("NA"),as.factor(d$NOISE))
> class(d$NOISE)
[1] "integer"
> d=RP12[,1:24]
> levels(d$NOISE)
[1] "." "0" "1" "2" "3" "4"
> class(d$NOISE)
[1] "factor"
> d$NOISE=ifelse(d$NOISE==".",as.factor("NA"),as.factor(d$NOISE))
> class(d$NOISE)
[1] "integer"
> d$NOISE=as.factor(d$NOISE)
> class(d$NOISE)
[1] "factor"
> levels(d$NOISE)
[1] "1" "2" "3" "4" "5" "6"
I need to do blanket find/replaces for a lot of values in this dataset, and most of the time they will be the equivalent of cell-specific find and replaces in Excel. These databases are all too big to be handled in Excel, so here I am. I am a newbie to data management in R, so please bear with me, help much appreciated.