2

I have data from a survey. It comes from a question that looks like this:

Did you do any of the following activities during your PhD

                             Yes, paid by my school. Yes, paid by me.  No. 

Attended an internationl conference?
Bought textbooks? 

The data is automatically saved in a spreadsheet in this way:

id conf.1 conf.2 conf.3 text.1 text.2 text.3

1    1                              1
2           1               1
3                   1       1
4                   1                    1
5               

This means participant 1 attended a conference paid by her university; particiapnt 2 attended the conference paid by him, and participant 3 didn't go.

I want to merge conf.1, conf.2 and conf.3 and text.1, text.2 and text.3 in single variables

id new.conf new.text

1   1        2
2   2        1
3   3        1
4   3        3

where the number now respresents the categories of the survey question

Thanks for your help
thelatemail
  • 91,185
  • 12
  • 128
  • 188
Bartolome Salom
  • 103
  • 2
  • 5

3 Answers3

2

You don't state whether or not each set of questions can have multiple answers. If so, this approach may not work for you. If that's the case, I suggest making your question more reproducible before moving forward. With that caveat out of the way, give this a whirl:

library(reshape2)
#recreate your data
dat <- data.frame(id = 1:5,
                  conf.1 = c(1,rep(NA,4)),
                  conf.2 = c(NA,1, rep(NA,3)),
                  conf.3 = c(NA,NA,1,1, NA),
                  text.1 = c(NA,1,1,NA,NA),
                  text.2 = c(1, rep(NA,4)),
                  text.3 = c(rep(NA,3),1, NA))

#melt into long format
dat.m <- melt(dat, id.vars = "id")
#Split on the "."
dat.m[, c("variable", "val")] <- with(dat.m, colsplit(variable, "\\.", c("variable", "val")))
#Subset out only the complete cases
dat.m <- dat.m[complete.cases(dat.m),]
#Cast back into wide format
dcast(id ~ variable, value.var = "val", data = dat.m)
#-----
  id conf text
1  1    1    2
2  2    2    1
3  3    3    1
4  4    3    3
Community
  • 1
  • 1
Chase
  • 67,710
  • 18
  • 144
  • 161
0

Here's a base method which will cope with missing values:

confvars <- c("conf.1","conf.2","conf.3")
textvars <- c("text.1","text.2","text.3")

which.sub <- function(x) {
maxsub <- apply(dat[x],1,which.max)
maxsub[(lapply(maxsub,length)==0)] <- NA
return(unlist(maxsub))
}

data.frame(
id = dat$id,
conf = which.sub(confvars),
text = which.sub(textvars)
)

Result:

  id conf text
1  1    1    2
2  2    2    1
3  3    3    1
4  4    3    3
5  5   NA   NA
thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • Thank you. I have one more question: is it possible to convert the reshaped table to a table for Latex showing the names of each level (e.g 1=sponsored by my institution; 2=sponsored by a different institution; 3=No) – Bartolome Salom Jul 23 '12 at 21:21
0

The following solution is very straightforward and I use it a lot. Let's use the same dataframe Chase did above.

dat <- data.frame(id = 1:5,
                  conf.1 = c(1,rep(NA,4)),
                  conf.2 = c(NA,1, rep(NA,3)),
                  conf.3 = c(NA,NA,1,1, NA),
                  text.1 = c(NA,1,1,NA,NA),
                  text.2 = c(1, rep(NA,4)),
                  text.3 = c(rep(NA,3),1, NA))

Now we start by replacing the NAs with zeros.

dat[is.na(dat)] <- 0

Multiplying each column by a different number allows us to simply calculate the new variables.

dat <- transform(dat, conf=conf.1 + 2*conf.2 + 3*conf.3,
                      text=text.1 + 2*text.2 + 3*text.3)

Let's recode the zeros in our new variables (or here for the whole dataset) to NA and were done.

dat[dat == 0] <- NA 

> dat
  id conf.1 conf.2 conf.3 text.1 text.2 text.3 conf text
1  1      1     NA     NA     NA      1     NA    1    2
2  2     NA      1     NA      1     NA     NA    2    1
3  3     NA     NA      1      1     NA     NA    3    1
4  4     NA     NA      1     NA     NA      1    3    3
5  5     NA     NA     NA     NA     NA     NA   NA   NA
Mark Heckmann
  • 10,943
  • 4
  • 56
  • 88