My searches on SO & elsewhere are coming up with interesting solutions to problems that have similar search terms but not my issue. Thought I found a solution, but the error is leaving me quite puzzled. I'm trying to learn tidyverse approaches better, but I appreciate any solution strategies.
Aim: Create new vector columns in a dataframe where each new vector is named from the factor level of an existing dataframe vector. The code solution should be dynamic so that it can be applied to factors with any number of levels.
Test data
df <- data.frame(x=c(1:5), y=letters[1:5])
Which produces as expected
> str(df)
'data.frame': 5 obs. of 2 variables:
$ x: int 1 2 3 4 5
$ y: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
> df
x y
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
and when finished should look like
> df
x y a b c d e
1 1 a NA NA NA NA NA
2 2 b NA NA NA NA NA
3 3 c NA NA NA NA NA
4 4 d NA NA NA NA NA
5 5 e NA NA NA NA NA
Tidy for loop approach
library(tidyverse)
for (i in 1:length(levels(df$y))) {
df <- mutate(df, levels(df$y)[i] = NA)
}
but that gives me the following error:
> for (i in 1:length(levels(df$y))) {
+ df <- mutate(df, levels(df$y)[i] = NA)
Error: unexpected '=' in:
"for (i in 1:length(levels(df$y))) {
df <- mutate(df, levels(df$y)[i] ="
> }
Error: unexpected '}' in "}"
Troubleshooting, I removed the loop and simplified the mutate to see if it works in general, which it will with or without the quotation marks (note, I reran the test data to start fresh).
levels(df$y)[1]
> "a"
df <- mutate(df, a = NA)
df <- mutate(df, "a" = NA) # works the same as the previous line
> df
x y a
1 1 a NA
2 2 b NA
3 3 c NA
4 4 d NA
5 5 e NA
Substituting the levels function back in, but without the loop returns the mutate error (note, I reran the test data to start fresh):
> df <- mutate(df, levels(df$y)[1] = NA)
Error: unexpected '=' in "df <- mutate(df, levels(df$y)[1] ="
I continue to get the same error is I try to use .data=df to specify the dataset or wrap as.character(), paste(), or paste0() around the levels function--which I picked up other various solutions online. Nor is R just being picky if I restructure the code using the %>% pipe.
What about the equal sign is unexpected with my levels code substitution (and potential newb mistakes)? Any assistance is greatly appreciated!