How can I convert a two-column "count" matrix to a binary vector in R?

Question

How can I convert a data frame having a two-column count matrix into a data frame with a single binary vector in R? For example, i have a data frame like this, where id is the id of a subject, s and f are the number of "successes" and "failures" for that subject, and x is a third variable describing some trait of that subject.

id s f x
1  0 3 A
2  2 1 A
3  1 2 B

i want this data frame to be converted to:

id n x
1  f A
1  f A
1  f A
2  s A
2  s A
2  f A
3  s B
3  f B
3  f B

where the column n indicates whether each trial is a success (s) or failure (f).

i'm sure i could code up a function to do this, but i'm wondering whether there's a prefab solution.

score 6 · Accepted Answer · edited Jan 06 '15 at 17:26

6

  dd <- read.table(text="id s f x
    1  0 3 A
    2  2 1 A
    3  1 2 B",
    header=TRUE)

 with(dd,data.frame(
         id=rep(id,s+f),
         n=rep(rep(c("s","f"),nrow(dd)),c(rbind(s,f))),
         x=rep(x,s+f)))

edited Jan 06 '15 at 17:26

tef2128

740
1
8
19

answered Jan 06 '15 at 02:10

Ben Bolker

211,554
25
370
453

Great. Works like a charm. See my function below using this code that works for any data frame, having any number of columns. Hope it helps! – tef2128 Jan 06 '15 at 19:18
What about the opposite? – Bakaburg Jan 13 '15 at 13:46
1

@Bakaburg, please go ahead and ask a new question. Some version of `table` plus `as.data.frame` plus `cbind` should do it. – Ben Bolker Jan 13 '15 at 14:24
I found a way... in one row `cbind(as.data.frame(table(df[2:(length(df))])), Success = as.data.frame(table(df[df[1] == 'y', 2:(length(df))]))$Freq)` – Bakaburg Jan 13 '15 at 16:44
i bet there are more elegant ways – Bakaburg Jan 13 '15 at 16:44
1

you can still post this as a question, answer it yourself, and see if someone comes up with a better/faster/more elegant approach – Ben Bolker Jan 13 '15 at 17:53

jazzurro · Answer 2 · 2015-01-06T02:15:09.620

Here is one way using the tidyr, splitstackshape packages. You reshape your data using gather. Then, you can use expandRows in the splitstackshape package. You are asking R to repeat each row by the numbers in the value column. For displaying purposes, I used arrange() from the dplyr package. But, this part is optional.

library(tidyr)
library(splitstackshape)
library(dplyr)

gather(mydf, variable, value, -id, -x) %>%
expandRows("value") %>%
arrange(id, x)


#  id x variable
#1  1 A        f
#2  1 A        f
#3  1 A        f
#4  2 A        s
#5  2 A        s
#6  2 A        f
#7  3 B        s
#8  3 B        f
#9  3 B        f

score 3 · Answer 3 · answered Jan 06 '15 at 19:16

Using Ben Bolker's excellent answer above, I have created a short function that will do this for any data frame containing one column with success counts, one column for failure counts, and any number of additional columns that contain information about each row (subject). See example below.

#####################################################################
### cnt2bin (count to binary) takes a data frame with 2-column ######
### "count" response variable of successes and failures and    ######
### converts it to long format, with one column showing        ######
### 0s and 1s for failures and successes.                      ######
### data is data frame with 2-column response variable         ######
### suc and fail are character expressions for columns         ######
### containing counts of successes and failures respectively   ######
#####################################################################

cnt2bin <- function(data, suc, fail) {

  xvars <- names(data)[names(data)!=suc & names(data)!=fail]
  list <- lapply(xvars, function(z) with(data, rep(get(z), get(suc)+get(fail))))
  names(list) <- xvars
  df <- as.data.frame(list)
  with(data,data.frame(bin=rep(rep(c(1,0),nrow(data)),c(rbind(get(suc),get(fail)))),
                       df))
}

Example, where id is the subject id, s and f are columns counting successes and failures for each subject, and x and y are variables that describe attributes of each subject, to be expanded and added to the final data frame.

dd <- read.table(text="id s f x y
                       1  0 3 A A
                       2  2 1 A B
                       3  1 2 B B",
                  header=TRUE)

cnt2bin(dd, "s", "f")

It's nice to see you thoroughly commenting your function. If you want to form a good habit that will **greatly** help if you ever want to make a package, you could comment functions using [Roxygen2 syntax](http://cran.r-project.org/web/packages/roxygen2/vignettes/rd.html). — Gregor Thomas, Jan 06 '15 at 19:25

How can I convert a two-column "count" matrix to a binary vector in R?

3 Answers3