The OP asked for help with the arguments to the cast()
function of the reshape
package. However, the reshape
package was superseded by the reshape2
package from the same package author. According to the package description, the reshape2
package is
A Reboot of the Reshape Package
Using reshape2
, the desired result can be produced with
reshape2::dcast(wc, PARENT_MOL_CHEMBL_ID ~ TARGET_TYPE, fun.aggregate = length,
value.var = "TARGET_TYPE")
# PARENT_MOL_CHEMBL_ID ABL EGFR TP53
#1 C10 1 1 0
#2 C939 0 0 1
BTW: The data.table
package has implemented (and enhanced) dcast()
as well. So, the same result can be produced with
data.table::dcast(wc, PARENT_MOL_CHEMBL_ID ~ TARGET_TYPE, fun.aggregate = length,
value.var = "TARGET_TYPE")
Additional columns
The OP mentioned other columns in the data frame which should be shown together with the spread or wide data. Unfortunately, the OP hasn't supplied particular sample data, so we have to consider two use cases.
Case 1: Additional columns go along with the id column
The data could look like
wc
# PARENT_MOL_CHEMBL_ID TARGET_TYPE extra_col1
#1 C10 ABL a
#2 C10 EGFR a
#3 C939 TP53 b
Note that the values in extra_col1
are in line with PARENT_MOL_CHEMBL_ID
.
This is an easy case, because the formula in dcast()
accepts ...
which represents all other variables not used in the formula:
reshape2::dcast(wc, ... ~ TARGET_TYPE, fun.aggregate = length,
value.var = "TARGET_TYPE")
# PARENT_MOL_CHEMBL_ID extra_col1 ABL EGFR TP53
#1 C10 a 1 1 0
#2 C939 b 0 0 1
The resulting data.frame does contain all other columns.
Case2: Additional columns don't go along with the id column
Now, another column is added:
wc
# PARENT_MOL_CHEMBL_ID TARGET_TYPE extra_col1 extra_col2
#1 C10 ABL a 1
#2 C10 EGFR a 2
#3 C939 TP53 b 3
Note that extra_col2
has two different values for C10
. This will cause the simple approach to fail. So, a two step approach has to be implemented: reshaping first and joining afterwards with the original data frame. The data.table
package is used for both steps, now:
library(data.table)
# reshape from long to wide, result has only one row per id column
wide <- dcast(setDT(wc), PARENT_MOL_CHEMBL_ID ~ TARGET_TYPE, fun.aggregate = length,
value.var = "TARGET_TYPE")
# right join, i.e., all rows of wc are included
wide[wc, on = "PARENT_MOL_CHEMBL_ID"]
# PARENT_MOL_CHEMBL_ID ABL EGFR TP53 TARGET_TYPE extra_col1 extra_col2
#1: C10 1 1 0 ABL a 1
#2: C10 1 1 0 EGFR a 2
#3: C939 0 0 1 TP53 b 3
The result shows the aggregated values in wide format together with any other columns.