2

My dataset looks something like this

ID  YOB  ATT94  GRADE94  ATT96  GRADE96  ATT 96 .....
1  1975     1        12      0       NA
2  1985     1        3       1       5
3  1977     0        NA      0       NA
4  ......

(with ATTXX a dummy var. denoting attendance at school in year XX, GRADEXX denoting the school grade)

I'm trying to create a dummy variable that = 1 if an individual is attending school when they are 19/20 years old. e.g. if YOB = 1988 and ATT98 = 1 then the new variable = 1 etc. I've been attempting this using mutate in dplyr but I'm new to R (and coding in general!) so struggle to get anything other than an error any code I write.

Any help would be appreciated, thanks.

Edit:

So, I've just noticed that something has gone wrong, I changed your code a bit just to add another column to the long format data table. Here is what I did in the end:

df %>%
  melt(id = c("ID", "DOB") %>%
  tbl_df() %>%
  mutate(dummy = ifelse(value - DOB %in% c(19,20), 1, 0)) 

so it looks something like e.g.

    ID  YOB   VARIABLE  VALUE  dummy
    1   1979  ATT94     1994   1
    1   1979  ATT96     1996   1
    1   1979  ATT98     0      0 
    2   1976  ATT94     0      0
    2   1976  ATT96     1996   1 
    2   1976  ATT98     1998   1

i.e. whenever the ATT variables take a value other than 0 the dummy = 1, even if they're not 19/20 years old. Any ideas what could be going wrong?

Milhouse
  • 177
  • 3
  • 11

3 Answers3

1

On my phone so I can't check this right now but try:

df$dummy[df$DOB==1988 & df$ATT98==1] <- 1

Edit: The above approach will create the column but when the condition does not hold it will be equal to NA

As @Greg Snow mentions, this approach assumes that the column was already created and is equal to zero initially. So you can do the following to get your dummy variable:

df$dummy <- rep(0, nrow(df))
df$dummy[df$DOB==1988 & df$ATT98==1] <- 1
Warner
  • 1,353
  • 9
  • 23
0

@Warner shows a way to create the variable (or at least the 1's the assumption is the column has already been set to 0). Another approach is to not explicitly create a dummy variable, but have it created for you in the model syntax (what you asked for is essentially an interaction). If running a regression, this would be something like:

fit <- lm( resp ~ I(DOB==1988):I(ATT98==1), data=df )

or

fit <- lm( resp ~ I( (DOB==1988) & (ATT98==1) ), data=df)
Greg Snow
  • 48,497
  • 6
  • 83
  • 110
0

Welcome to the world of code! R's syntax can be tricky (even for experienced coders) and dplyr adds its own quirks. First off, it's useful when you ask questions to provide code that other people can run in order to be able to reproduce your data. You can learn more about that here.

Are you trying to create code that works for all possible values of DOB and ATTx? In other words, do you have a whole bunch of variables that start with ATT and you want to look at all of them? That format is called wide data, and R works much better with long data. Fortunately the reshape2 package does exactly that. The code below creates a dummy variable with a value of 1 for people who were in school when they were either 19 or 20 years old.

# Load libraries 
library(dplyr)
library(reshape2)

# Create a sample dataset
ATT94 <- runif(500, min = 0, max = 1) %>% round(digits = 0)
ATT96 <- runif(500, min = 0, max = 1) %>% round(digits = 0)
ATT98 <- runif(500, min = 0, max = 1) %>% round(digits = 0)
DOB <- rnorm(500, mean = 1977, sd = 5) %>% round(digits = 0)
df <- cbind(DOB, ATT94, ATT96, ATT98) %>% data.frame()

# Recode ATTx variables with the actual year
df$ATT94[df$ATT94==1] <- 1994
df$ATT96[df$ATT96==1] <- 1996
df$ATT98[df$ATT98==1] <- 1998

# Melt the data into a long format and perform requested analysis
df %>%
  melt(id = "DOB") %>%
  tbl_df() %>%
  mutate(dummy = ifelse(value - DOB %in% c(19,20), 1, 0))
Community
  • 1
  • 1
Andrew Brēza
  • 7,705
  • 3
  • 34
  • 40
  • 1
    yeah I was trying to get something that worked for all DOB. I'm just going through this now but I think this works well and the reshape2 package looks really useful, I've got a few more dummies to create but I should be able to figure them out now myself so thanks! – Milhouse Jul 27 '16 at 13:43
  • Great! If the answer is helpful feel free to select it as your choice so the question will appear as answered to other users. Let me know if you get hung up anywhere trying to get the code to work on your actual dataset. – Andrew Brēza Jul 27 '16 at 14:44