1

I have columns in a dataset that could potentially either contain 0 or 1, but some of the columns just contain 0.

I want to use these numbers as factors but I still want every column to have the levels 0 and 1. I tried the code below but I keep getting an error but I cant understand why...

#dataframe df has 100 rows

column_list = c("col1", "col2",  "col3")  

for (col in column_list) {
      #convert number 0 and number 1 to factors
      # (but sometimes the data only has zeros)
      df[,col] <- as.factor(df[,col])

      # I want to force levels to be 0 and 1
      # this is for when the data is completely missing number 1

      levels(df[, col] <- c(0,1))          #give error

      # Error in `[<-.data.frame`(`*tmp*`, , col, value = c(0, 1)) : 
      # replacement has 2 rows, data has 100


      print(levels(df[, col]))
      #this produces "0" "1" or just "0" depending on the column

}
Mel
  • 6,214
  • 10
  • 54
  • 71

2 Answers2

2

I think you have just put a ) in the wrong place

This works:

column_list = c("col1", "col2",  "col3")  
df <- data.frame(matrix(0, nrow = 100, ncol = 3))
names(df) <- column_list

for (col in column_list) {
  #convert number 0 and number 1 to factors
  # (but sometimes the data only has zeros)
  df[,col] <- as.factor(df[,col])

  # I want to force levels to be 0 and 1
  # this is for when the data is completely missing number 1

  levels(df[, col]) <- c(0,1)          #no error anymore

  # Error in `[<-.data.frame`(`*tmp*`, , col, value = c(0, 1)) : 
  # replacement has 2 rows, data has 100


  print(levels(df[, col]))
  #this produces "0" "1" or just "0" depending on the column

}
Shinobi_Atobe
  • 1,793
  • 1
  • 18
  • 35
  • Thank you! levels(df[, col]) <- c(0,1) works! It was just the ")". I wasted about four hours trying to find that error. Doh! – Mel Jun 01 '18 at 07:38
1

You indicate where your error is, the line is not written correctly. It should be:

df[, col] <- factor(df[, col], levels = c(0,1)

You don't even need your previous line. You can even avoid the for loop and use apply:

df <- apply(df, 2, levels, c(0,1))
Marc P
  • 353
  • 1
  • 17