4

My data table df has a subject column (e.g. "SubjectA", "SubjectB", ...). Each subject answers many questions, and the table is in long format, so there are many rows for each subject. The subject column is a factor. I want to create a new column - call it subject.id - that is simply a numeric version of subject. So for all rows with "SubjectA", it would be 1; for all rows with "SubjectB", it would be 2; etc.

I know that an easy way to do this with dplyr would be to call df %>% mutate(subject.id = as.numeric(subject)). But I was trying to do it this way:

subj.list <- unique(as.character(df$subject))
df %>% mutate(subject.id = which(as.character(subject) == subj.list))

And I get this error:

Error: wrong result size (12), expected 72 or 1

Why does this happen? I'm not interested in other ways to solve this particular problem. Rather, I worry that my inability to understand this error reflects a deep misunderstanding of dplyr or mutate. My understanding is that this call should be conceptually equivalent to:

df$subject.id <- NULL
for (i in 1:nrow(df)) {
   df$subject.id[i] <- which(as.character(df$subject[i]) == subj.list))
}

But the latter works and the former doesn't. Why?

Reproducible example:

df <- InsectSprays %>% rename(subject = spray)
subj.list <- unique(as.character(df$subject))

# this works
df$subject.id <- NULL
for (i in 1:nrow(df)) {
   df$subject.id[i] <- which(as.character(df$subject[i]) == subj.list)
}

# but this doesn't
df %>% mutate(subject.id = which(as.character(subject) == subj.list))
Adam Morris
  • 293
  • 3
  • 8

2 Answers2

3

The issue is that operators and functions are applied in a vectorized way by mutate. Thus, which is applied to the vector produced by as.character(df$subject) == subj.list, not to each row (as in your loop).

Using rowwise as described here would solve the issue: https://stackoverflow.com/a/24728107/3772587

So, this will work:

df %>% 
  rowwise() %>%
  mutate(subject.id = which(as.character(subject) == subj.list))
Vivid
  • 604
  • 1
  • 7
  • 11
0

Since your df$subject is a factor, you could simply do:

df %>% mutate(subj.id=as.numeric(subject))

Or use a left join approach:

subj.df <- df$subject %>% 
    unique() %>% 
    as_tibble() %>% 
    rownames_to_column(var = 'subj.id')

df %>% left_join(subj.df,by = c("subject"="value"))
Rahul
  • 2,579
  • 1
  • 13
  • 22
  • Thanks for the response. I know there are other ways to solve the particular problem, but my question is why the one I proposed doesn't work. I'm worried that I have some deep misunderstanding of `mutate` or `dplyr`. – Adam Morris Mar 02 '17 at 16:01
  • @AdamMorris ah, hmm. I can't answer that offhand! Hope someone can help out! – Rahul Mar 03 '17 at 00:08