The reshape
package is your friend, here. In this case, melt()
and untable()
are useful for normalizing the data.
If the example summary data.frame is in a variable called df
, an abbreviated answer is:
# replace total n with number of failures
df$fail = df$n - df$success
df$n = NULL
# melt and untable the data.frame
df = melt(df)
df = untable(df, df$value)
# recode the results, e.g., here by creating a new data.frame
df = data.frame(
test = df$test,
group = df$group,
success = as.numeric(df$variable == "success")
)
This is a great example of a very general problem. The idea is to back calculate the list of data that underlies a cross-tabulation. Given the cross-tabulation, the back-calculated list of data has one row for each datum and contains the attributes of each datum. Here is a post to the inverse of this question.
In "data geek" parlance, this is a question of putting tabulated data in First Normal Form -- if that is helpful to anyone. You can google data normalization, which will help you design agile data.frames that can be cross-tabulated and analyzed in many different ways.
In detail, for melt()
and untable()
to work here, the raw data need to be tweaked a bit to include fail
(number of failures) rather than total n
, but that is simple enough:
df$fail <- df$n - df$success
df$n <- NULL
which gives:
test group success fail
1 A control 1 1
2 A treat 2 1
3 B control 3 2
4 B treat 1 2
Now we can "melt" the table. melt()
can back-calculate the original list of data that was used to create a cross tabulation.
df <- melt(df)
In this case, we get new column called variable
that contains either "success" or "fail", and a column called value
that contains the datum from the original success
or fail
column.
test group variable value
1 A control success 1
2 A treat success 2
3 B control success 3
4 B treat success 1
5 A control fail 1
6 A treat fail 1
7 B control fail 2
8 B treat fail 2
The untable()
function repeats each row of a table according to the value of a numeric "count" vector. In this case, df$value
is the count vector, because it contains the number of successes and fails.
df <- untable(df, df$value)
which will yield one record for each datum, either a "success" or a "fail":
test group variable value
1 A control success 1
2 A treat success 2
2.1 A treat success 2
3 B control success 3
3.1 B control success 3
3.2 B control success 3
4 B treat success 1
5 A control fail 1
6 A treat fail 1
7 B control fail 2
7.1 B control fail 2
8 B treat fail 2
8.1 B treat fail 2
This is the solution. If required, the data can now be recoded to replace "success" with 1 and "fail" with 0 (and get rid of the the extraneous value
and variable
columns...)
df <- data.frame(
test = df$test,
group = df$group,
success = as.numeric(df$variable == "success")
)
This returns the requested solution, tho the rows are sorted differently:
test group success
1 A control 1
2 A treat 1
3 A treat 1
4 B control 1
5 B control 1
6 B control 1
7 B treat 1
8 A control 0
9 A treat 0
10 B control 0
11 B control 0
12 B treat 0
13 B treat 0
Obviously, the data.frame can be resorted, if necessary. How to sort a data.frame in R.