3

How can I convert a data frame having a two-column count matrix into a data frame with a single binary vector in R? For example, i have a data frame like this, where id is the id of a subject, s and f are the number of "successes" and "failures" for that subject, and x is a third variable describing some trait of that subject.

id s f x
1  0 3 A
2  2 1 A
3  1 2 B

i want this data frame to be converted to:

id n x
1  f A
1  f A
1  f A
2  s A
2  s A
2  f A
3  s B
3  f B
3  f B

where the column n indicates whether each trial is a success (s) or failure (f).

i'm sure i could code up a function to do this, but i'm wondering whether there's a prefab solution.

tef2128
  • 740
  • 1
  • 8
  • 19

3 Answers3

6
  dd <- read.table(text="id s f x
    1  0 3 A
    2  2 1 A
    3  1 2 B",
    header=TRUE)

 with(dd,data.frame(
         id=rep(id,s+f),
         n=rep(rep(c("s","f"),nrow(dd)),c(rbind(s,f))),
         x=rep(x,s+f)))
tef2128
  • 740
  • 1
  • 8
  • 19
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
5

Here is one way using the tidyr, splitstackshape packages. You reshape your data using gather. Then, you can use expandRows in the splitstackshape package. You are asking R to repeat each row by the numbers in the value column. For displaying purposes, I used arrange() from the dplyr package. But, this part is optional.

library(tidyr)
library(splitstackshape)
library(dplyr)

gather(mydf, variable, value, -id, -x) %>%
expandRows("value") %>%
arrange(id, x)


#  id x variable
#1  1 A        f
#2  1 A        f
#3  1 A        f
#4  2 A        s
#5  2 A        s
#6  2 A        f
#7  3 B        s
#8  3 B        f
#9  3 B        f
jazzurro
  • 23,179
  • 35
  • 66
  • 76
3

Using Ben Bolker's excellent answer above, I have created a short function that will do this for any data frame containing one column with success counts, one column for failure counts, and any number of additional columns that contain information about each row (subject). See example below.

#####################################################################
### cnt2bin (count to binary) takes a data frame with 2-column ######
### "count" response variable of successes and failures and    ######
### converts it to long format, with one column showing        ######
### 0s and 1s for failures and successes.                      ######
### data is data frame with 2-column response variable         ######
### suc and fail are character expressions for columns         ######
### containing counts of successes and failures respectively   ######
#####################################################################

cnt2bin <- function(data, suc, fail) {

  xvars <- names(data)[names(data)!=suc & names(data)!=fail]
  list <- lapply(xvars, function(z) with(data, rep(get(z), get(suc)+get(fail))))
  names(list) <- xvars
  df <- as.data.frame(list)
  with(data,data.frame(bin=rep(rep(c(1,0),nrow(data)),c(rbind(get(suc),get(fail)))),
                       df))
}

Example, where id is the subject id, s and f are columns counting successes and failures for each subject, and x and y are variables that describe attributes of each subject, to be expanded and added to the final data frame.

dd <- read.table(text="id s f x y
                       1  0 3 A A
                       2  2 1 A B
                       3  1 2 B B",
                  header=TRUE)

cnt2bin(dd, "s", "f")
tef2128
  • 740
  • 1
  • 8
  • 19
  • 2
    It's nice to see you thoroughly commenting your function. If you want to form a good habit that will **greatly** help if you ever want to make a package, you could comment functions using [Roxygen2 syntax](http://cran.r-project.org/web/packages/roxygen2/vignettes/rd.html). – Gregor Thomas Jan 06 '15 at 19:25