1

I have a dataframe in R with 11M rows and 46 columns. Some of the fields contain empty strings (""). I need to replace those empty strings with NAs because write.dta (in the foreign package) can't handle empty strings.

My for-loop, however, takes very long (around 15 minutes per column; sometimes R/ entire system crashes). I'm running RStudio (R 3.0.2) on a 8GB RAM Mac. Does anyone know of a faster way?

for (i in 1:46){

   if (length(which(myDF[,i]==""))!=0) {

    myDF[,i][which(myDF[,i]=="")]<-NA 

   }
}
Frank
  • 66,179
  • 8
  • 96
  • 180
yumba
  • 1,056
  • 2
  • 16
  • 31

2 Answers2

10

This should work:

myDF[myDF==''] <- NA
Zbynek
  • 5,673
  • 6
  • 30
  • 52
  • Do you have a suggestion to make this work with data frames that have posix columns? I like your solution but it produces error "Error in as.POSIXlt.character(x, tz, ...): character string is not in a standard unambiguous format" and I would like to preserve data types. – trisaratops Aug 07 '17 at 19:24
  • @triSaratops I think you can not have empty string in POSIXlt column - `as.POSIXct('')` produces error. Could you please post example of your data? – Zbynek Aug 16 '17 at 08:55
2

You can also use the is.na<- function:

is.na(myDF) <- myDF == ''
Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168