1

My searches on SO & elsewhere are coming up with interesting solutions to problems that have similar search terms but not my issue. Thought I found a solution, but the error is leaving me quite puzzled. I'm trying to learn tidyverse approaches better, but I appreciate any solution strategies.

Aim: Create new vector columns in a dataframe where each new vector is named from the factor level of an existing dataframe vector. The code solution should be dynamic so that it can be applied to factors with any number of levels.

Test data

df <- data.frame(x=c(1:5), y=letters[1:5])

Which produces as expected

> str(df)
'data.frame':   5 obs. of  2 variables:
 $ x: int  1 2 3 4 5
 $ y: Factor w/ 5 levels "a","b","c","d",..: 1 2 3 4 5
> df
  x y
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e

and when finished should look like

> df
  x y  a  b  c  d  e
1 1 a NA NA NA NA NA
2 2 b NA NA NA NA NA
3 3 c NA NA NA NA NA
4 4 d NA NA NA NA NA
5 5 e NA NA NA NA NA

Tidy for loop approach

library(tidyverse)

for (i in 1:length(levels(df$y))) {
  df <- mutate(df, levels(df$y)[i] = NA)
}

but that gives me the following error:

> for (i in 1:length(levels(df$y))) {
+   df <- mutate(df, levels(df$y)[i] = NA)
Error: unexpected '=' in:
"for (i in 1:length(levels(df$y))) {
  df <- mutate(df, levels(df$y)[i] ="
> }
Error: unexpected '}' in "}"

Troubleshooting, I removed the loop and simplified the mutate to see if it works in general, which it will with or without the quotation marks (note, I reran the test data to start fresh).

levels(df$y)[1]
> "a"

df <- mutate(df, a = NA)
df <- mutate(df, "a" = NA) # works the same as the previous line
> df
  x y  a
1 1 a NA
2 2 b NA
3 3 c NA
4 4 d NA
5 5 e NA

Substituting the levels function back in, but without the loop returns the mutate error (note, I reran the test data to start fresh):

> df <- mutate(df, levels(df$y)[1] = NA)
Error: unexpected '=' in "df <- mutate(df, levels(df$y)[1] ="

I continue to get the same error is I try to use .data=df to specify the dataset or wrap as.character(), paste(), or paste0() around the levels function--which I picked up other various solutions online. Nor is R just being picky if I restructure the code using the %>% pipe.

What about the equal sign is unexpected with my levels code substitution (and potential newb mistakes)? Any assistance is greatly appreciated!

Shawn Janzen
  • 369
  • 3
  • 15
  • 3
    I'm not sure why not just something like `df[, levels(df$y)] <- NA`? – arg0naut91 Feb 10 '20 at 19:57
  • 1
    The argument name you pass to a function isn't evaluated. That is, something like `foo = "na.rm"; mean(c(1, NA), foo = TRUE))` doesn't work. That's more-or-less why your attempt with `levels(df$y)[i] = NA` fails. Have a read of the [Programming with dplyr](https://dplyr.tidyverse.org/articles/programming.html) vignette to learn about workarounds. Or, for the newest approach, try using `{{` and `:=` from `rlang` [as shown in this answer](https://stackoverflow.com/a/59224230/903061). – Gregor Thomas Feb 10 '20 at 20:10
  • Thanks @arg0naut91! Your solution is short, elegant, and works! So simplistic in the approach, it totally eluded me. – Shawn Janzen Feb 10 '20 at 20:40

1 Answers1

0

Posting solutions for others based on comments received, and so I can mark this question as solved. Please give up votes to @arg0naut91 and @Gregor for their solutions & guided help.

Test data

df <- data.frame(x=c(1:5), y=letters[1:5])

Solution 1: base R

@arg0naut91 provided an elegant base R solution:

df[, levels(df$y)] <- NA
df
  x y  a  b  c  d  e
1 1 a NA NA NA NA NA
2 2 b NA NA NA NA NA
3 3 c NA NA NA NA NA
4 4 d NA NA NA NA NA
5 5 e NA NA NA NA NA

Solution 2: using quo() and :=

@Gregor's guidance & useful links showed how some functions, and pretty much all of the tidyverse, does not evaluate objects as we might expect.

First test with a single new column:

df <- data.frame(x=c(1:5), y=letters[1:5]) # refresh test data

varlevel <- levels(df$y)[1] # where level 1=a
df <- mutate(df, !!varlevel := NA)
rm(varlevel) # cleanup
df
  x y  a
1 1 a NA
2 2 b NA
3 3 c NA
4 4 d NA
5 5 e NA

Then put it into the for loop to capture each factor level as a new column:

df <- data.frame(x=c(1:5), y=letters[1:5]) # refresh test data

for (i in 1:length(levels(df$y))) {
+   varlevel <- levels(df$y)[i]
+   df <- mutate(df, !!varlevel := NA)
+   rm(varlevel) # cleanup
+   }
df
  x y  a  b  c  d  e
1 1 a NA NA NA NA NA
2 2 b NA NA NA NA NA
3 3 c NA NA NA NA NA
4 4 d NA NA NA NA NA
5 5 e NA NA NA NA NA
Shawn Janzen
  • 369
  • 3
  • 15