I am currently strugling to remove words from a large dataframe in R. This is the df:
The first column (GeneID) contains a so called "ensembl gene ID". First one i.e. ENSG00000223972.5 followed by a "|". Afterwards, the real Gene name is listed. So i now want to remove the "ensembl gene ID" including the "|" to keep only the real gene name in this column. Is there a smart way to do this ? For example with the stringR package?
Cheers!
Edit:
> dput(head(data3))
structure(list(GeneID = c("ENSG00000223972.5|DDX11L1", "ENSG00000227232.5|WASH7P",
"ENSG00000278267.1|MIR6859-1", "ENSG00000243485.5|MIR1302-2HG",
"ENSG00000284332.1|MIR1302-2", "ENSG00000237613.2|FAM138A"),
`DC2-CD5pos-d1` = c(2, 47, 0, 0, 0, 0), `DC2-CD5pos-d2` = c(0,
41, 0, 0, 0, 0), `DC2-CD5pos-d3` = c(2, 31, 0, 0, 0, 0),
`DC2-CD5pos-d4` = c(0, 29, 0, 0, 0, 0), `DC3-d1` = c(1, 36,
0, 0, 0, 0), `DC3-d2` = c(0, 33, 0, 0, 0, 0), `DC3-d3` = c(0,
49, 0, 0, 0, 3), `DC3-d4` = c(0, 27, 0, 0, 0, 0), `DC2-BTLA-S-d1` = c(2,
4, 0, 1, 0, 0), `DC2-BTLA-S-d3` = c(6, 6, 1, 0, 0, 0), `DC2-BTLA-S-d4` = c(2,
1, 0, 0, 0, 0), `DC3-CD163-S-d1` = c(2, 8, 2, 0, 0, 0), `DC3-CD163-S-d3` = c(5,
9, 0, 0, 0, 0), `DC3-CD163-S-d4` = c(0, 5, 0, 0, 0, 0)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))