My goal is to make a for loop to convert some specific columns of my dataset into either factors or integers.
The condition is going to be based on the name of the column.
# Here is a small reproducible dataset
df <- data.frame(x = c(10,20,30), y = c("yes", "no", "no"), z = c("Big", "Small", "Average"))
# here is a vector that we are going to use inside our if statement
column_factor_names <- c("y", "z")
# for each column in df
for (i in names(df)) {
print(i)
# if it's a factor, convert into factor, else convert it into integer
if (i %in% column_factor_names) {
print("it's a factor")
df$i <- as.factor(df$i)
} else {
print("it's an integer")
df$i <- as.integer(df$i)
}
}
When I run this I get : Error in `$<-.data.frame`(`*tmp*`, "i", value = integer(0)) :
replacement has 0 rows, data has 3
The problem is with the line df$i <- as.factor(df$i)
and df$i <- as.integer(df$i)
in the if-else statement.
But what I don't understand, is that when I run this manually. For example:
df$"x" <- as.integer(df$"x")
df$"y" <- as.factor(df$"y")
df$"z" <- as.factor(df$"z")
str(df)
It is working:
'data.frame': 3 obs. of 3 variables:
$ x: int 10 20 30
$ y: Factor w/ 2 levels "no","yes": 2 1 1
$ z: Factor w/ 3 levels "Average","Big",..: 2 3 1
My question is: why is it not working in the for-loop and if statement?