-1

I have blank/empty values in my dataset after loading it from csv.

I found out that I can do this:

data$col[data$col==""] <- "NA"
data$col <- as.factor(data$col)

to change them to NA but I have nearly 200 columns so it's not the best method. I tried a for loop with all types of indexation but it didn't really work. What am I missing? Except overwriting my data multiple times with NA's

for (i in 1:189) {
  if (class(data[[i]]) == "character") {
    data[data[[i]] == "", ] <- "NA"
  }
Cœur
  • 37,241
  • 25
  • 195
  • 267
Kris
  • 3
  • 1

3 Answers3

0

If you want to convert all empty strings ("") in your data frame to NA without loops do:

df[df==""] = NA

For example:

df = data.frame(id = 1:4, 
                name = c("John","Jill","","Jane"), 
                surname = c("Smith","","Peters",""))

> df
  id name surname
1  1 John   Smith
2  2 Jill        
3  3       Peters
4  4 Jane        

df[df==""] = NA

> df
  id name surname
1  1 John   Smith
2  2 Jill    <NA>
3  3 <NA>  Peters
4  4 Jane    <NA>
R. Schifini
  • 9,085
  • 2
  • 26
  • 32
  • Can I ask how this df[df==""] work? I can't really comprehend, it's too unusual as R is new for me. My only guess is that df=="" creates a new boolean vector/matrix of the same dimensions as df, so 4x3 and then aplly the assignment only where it's true? – Kris Mar 06 '18 at 08:23
  • This is called a logic index (or logical indexing). You guessed this right, a logical matrix or vector is used to subset the object. The last assignment changes the value only for those where the condition is true. – R. Schifini Mar 06 '18 at 10:49
0

try this: I generated an example:

  test.df <- data.frame(x1=c(NA,2,3,NA),x2=c(1,2,3,4),x3=c(1,"" ,"" ,4))
    test.df[test.df==""] <- NA
RomRom
  • 302
  • 1
  • 11
0

You can either read the data with the attribute na.strings:

read.csv("data2.csv", header=T, na.strings=c("","NA"))

Already a stackoverflow article about this

or using your logic:

for (i in seq(1,length(colnames(data)))){
  data[,i] <- as.character(data[,i])
  data[,i][data[,i] == ""]<-NA

}
benschbob91
  • 155
  • 12
  • Yes I tried it with read csv but there were quotes with commas in them so it didn't load the whole data only few rows. Only fread helped to load everything correctly. Thank you for the code – Kris Mar 06 '18 at 08:10