0

I am currently strugling to remove words from a large dataframe in R. This is the df:

Dataframe

The first column (GeneID) contains a so called "ensembl gene ID". First one i.e. ENSG00000223972.5 followed by a "|". Afterwards, the real Gene name is listed. So i now want to remove the "ensembl gene ID" including the "|" to keep only the real gene name in this column. Is there a smart way to do this ? For example with the stringR package?

Cheers!

Edit:

 > dput(head(data3))
structure(list(GeneID = c("ENSG00000223972.5|DDX11L1", "ENSG00000227232.5|WASH7P", 
"ENSG00000278267.1|MIR6859-1", "ENSG00000243485.5|MIR1302-2HG", 
"ENSG00000284332.1|MIR1302-2", "ENSG00000237613.2|FAM138A"), 
    `DC2-CD5pos-d1` = c(2, 47, 0, 0, 0, 0), `DC2-CD5pos-d2` = c(0, 
    41, 0, 0, 0, 0), `DC2-CD5pos-d3` = c(2, 31, 0, 0, 0, 0), 
    `DC2-CD5pos-d4` = c(0, 29, 0, 0, 0, 0), `DC3-d1` = c(1, 36, 
    0, 0, 0, 0), `DC3-d2` = c(0, 33, 0, 0, 0, 0), `DC3-d3` = c(0, 
    49, 0, 0, 0, 3), `DC3-d4` = c(0, 27, 0, 0, 0, 0), `DC2-BTLA-S-d1` = c(2, 
    4, 0, 1, 0, 0), `DC2-BTLA-S-d3` = c(6, 6, 1, 0, 0, 0), `DC2-BTLA-S-d4` = c(2, 
    1, 0, 0, 0, 0), `DC3-CD163-S-d1` = c(2, 8, 2, 0, 0, 0), `DC3-CD163-S-d3` = c(5, 
    9, 0, 0, 0, 0), `DC3-CD163-S-d4` = c(0, 5, 0, 0, 0, 0)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))
Lokihas
  • 1
  • 2
  • Does this answer your question? [Remove all text before colon](https://stackoverflow.com/questions/12297859/remove-all-text-before-colon) – benson23 Apr 13 '22 at 16:34
  • 2
    Maybe something like `gsub(".*\\|", "", df$GeneID)` – benson23 Apr 13 '22 at 16:35
  • Please do not post (only) an image of code/data/errors: it breaks screen-readers and it cannot be copied or searched (ref: https://meta.stackoverflow.com/a/285557 and https://xkcd.com/2116/). Please include the code, console output, or data (e.g., `data.frame(...)` or the output from `dput(head(x))`) directly. – r2evans Apr 13 '22 at 16:46
  • With this, it appears simple enough to just include `head(dat$GeneID)` to give us *something*. Two reasons why I no longer take much effort to working with images of data: (1) there is often something not immediately apparent that makes it different; this could be `factor`s, extra spaces, unicode, etc. (2) The onus is on the asker to make it as easy for potential answerers to help, I choose to not spend time trying OCR on an image of data that you have readily available on your desktop. When reproducible,you increase the likelihood that we can quickly give you an accurate/effective answer. – r2evans Apr 13 '22 at 16:49
  • @benson23 Yes your code worked perfectly fine, thanks so much! – Lokihas Apr 13 '22 at 16:54

0 Answers0