2

My goal is to make a for loop to convert some specific columns of my dataset into either factors or integers.

The condition is going to be based on the name of the column.


# Here is a small reproducible dataset
df <- data.frame(x = c(10,20,30), y = c("yes", "no", "no"), z = c("Big", "Small", "Average"))

# here is a vector that we are going to use inside our if statement
column_factor_names <- c("y", "z")

# for each column in df
for (i in names(df)) {

    print(i)

    # if it's a factor, convert into factor, else convert it into integer

    if (i %in% column_factor_names) {
        print("it's a factor")
        df$i <- as.factor(df$i)
    } else {
        print("it's an integer")
        df$i <- as.integer(df$i)
    }
}

When I run this I get : Error in `$<-.data.frame`(`*tmp*`, "i", value = integer(0)) : replacement has 0 rows, data has 3

The problem is with the line df$i <- as.factor(df$i) and df$i <- as.integer(df$i) in the if-else statement.

But what I don't understand, is that when I run this manually. For example:

df$"x" <- as.integer(df$"x")
df$"y" <- as.factor(df$"y")
df$"z" <- as.factor(df$"z")

str(df)

It is working:

'data.frame':   3 obs. of  3 variables:
 $ x: int  10 20 30
 $ y: Factor w/ 2 levels "no","yes": 2 1 1
 $ z: Factor w/ 3 levels "Average","Big",..: 2 3 1

My question is: why is it not working in the for-loop and if statement?

MDEWITT
  • 2,338
  • 2
  • 12
  • 23
RobZ
  • 496
  • 1
  • 10
  • 26
  • No need for `forloop`, Related, possible duplicate of https://stackoverflow.com/a/33180265/680068 – zx8754 Sep 12 '19 at 10:47

2 Answers2

2

In your code the subset function $ looks for a column named i instead of evaluating i. You can choose to subset the data.frame differently either with [, i] or [[i]]:

x <- data.frame(x = c(10,20,30), y = c("yes", "no", "no"), z = c("Big", "Small", "Average"))

# here is a vector that we are going to use inside our if statement
column_factor_names <- c("y", "z")

# for each column in df
for (i in names(df)) {

  print(i)

  # if it's a factor, convert into factor, else convert it into integer

  if (i %in% column_factor_names) {
    print("it's a factor")
    x[[i]] <- as.factor(x[[i]])
  } else {
    print("it's an integer")
    x[[i]] <- as.integer(x[[i]])
  }
}

See help("$") for more infos.

If you don't mind loosing the status message you could also do it without the need for a loop:

x[, i] <- as.factor(x[, i])
JBGruber
  • 11,727
  • 1
  • 23
  • 45
1

The corrected code for you for loop section is:

# Here is a small reproducible dataset
df <- data.frame(x = c(10,20,30), y = c("yes", "no", "no"), z = c("Big", "Small", "Average"))

# here is a vector that we are going to use inside our if statement
column_factor_names <- c("y", "z")

for (i in names(df)) {
    print(i)
    if (i %in% column_factor_names) {
        print("it's a factor")
        df[,i] <- as.factor(df[,i])
    } else {
        print("it's an integer")
        df[,i] <- as.numeric(df[,i])
    }
 }
kashiff007
  • 376
  • 2
  • 12