I am looking at gene ontology, having this dataframe:
> head(BT_Ctrl_go_terms, 13)
# A tibble: 13 x 4
go_term n gene go_name
<chr> <int> <chr> <chr>
1 GO:0001525 15 NRP1 angiogenesis
2 GO:0001525 15 ANG angiogenesis
3 GO:0001525 15 THY1 angiogenesis
4 GO:0001525 15 ATP5F1B angiogenesis
5 GO:0001525 15 ECM1 angiogenesis
6 GO:0001666 6 ANG response to hypoxia
7 GO:0001666 6 CAT response to hypoxia
8 GO:0001666 6 HSP90B1 response to hypoxia
9 GO:0002250 8 IGKV1-27 adaptive immune response
10 GO:0002250 8 IGHV3-21 adaptive immune response
11 GO:0002250 8 TNFRSF21 adaptive immune response
12 GO:0002250 8 IGLV2-11 adaptive immune response
13 GO:0002250 8 IGHV4-34 adaptive immune response
I need to arrange data so that each go_name
is listed on a row one time. Then, I need a new covariate genes
that lists all BT_Ctrl_go_term$gene
that belongs to the corresponding BT_Ctrl_go_term$go_name
. Each gene name
must be separated by ,
.
Expected output:
go_term n go_name genes
1 GO:0001525 15 angiogenesis NRP1, ANG, THY1, ATP5F1B, ECM1
2 GO:0001666 6 response to hypoxia ANG, CAT, HSP90B1
3 GO:0002250 8 adaptive immune response IGKV1-27, IGHV3-21, TNFRSF21, IGLV2-11, IGHV4-34
A dplyr
solution is preferable.
Data
BT_Ctrl_go_term <- structure(list(go_term = c("GO:0001525", "GO:0001525", "GO:0001525",
"GO:0001525", "GO:0001525", "GO:0001666", "GO:0001666", "GO:0001666",
"GO:0002250", "GO:0002250", "GO:0002250", "GO:0002250", "GO:0002250"
), n = c(15L, 15L, 15L, 15L, 15L, 6L, 6L, 6L, 8L, 8L, 8L, 8L,
8L), gene = c("NRP1", "ANG", "THY1", "ATP5F1B", "ECM1", "ANG",
"CAT", "HSP90B1", "IGKV1-27", "IGHV3-21", "TNFRSF21", "IGLV2-11",
"IGHV4-34"), go_name = c("angiogenesis", "angiogenesis", "angiogenesis",
"angiogenesis", "angiogenesis", "response to hypoxia", "response to hypoxia",
"response to hypoxia", "adaptive immune response", "adaptive immune response",
"adaptive immune response", "adaptive immune response", "adaptive immune response"
)), row.names = c(NA, -13L), class = c("tbl_df", "tbl", "data.frame"
))