Removing characters at the end of a string using R

Question

I have a large dataset and I would like to remove characters, starting with e, v, i at the end of a string. My dataset looks like this

P*01:01:05e1
P*01:01:05e2
P*01:01:05e3
P*01:01:05e10
P*02:02v1
P*02:02v2
P*02:01:03v2
P*05:01:01i1
P*05:01:01i8

and I want it to be P*01:01:05, P*02:02, P*02:01:03, P*05:01:01. I first tried removing the 'e' letters using

> xdata$gene <-gsub("e*", "", xdata$gene, perl = TRUE)

but I get this error message

Error in `$<-.data.frame`(`*tmp*`, "gene", value = character(0)) : 
  replacement has 0 rows, data has 58

It appears I cannot replace 'e' with nothing. Any suggestions?

Data

xdata <- read.table(header = TRUE, stringsAsFactors = FALSE,
                    text = "gene
                    P*01:01:05e1
                    P*01:01:05e2
                    P*01:01:05e3
                    P*01:01:05e10
                    P*02:02v1
                    P*02:02v2
                    P*02:01:03v2
                    P*05:01:01i1
                    P*05:01:01i8")

Try `stringr::str_split_fixed(df1$V1, pattern = "e|v|i", n = 2)` — zx8754, Nov 18 '16 at 21:05
What about: `strings <- c("P*01:01:05e1", "P*02:01:03v2")` `strings <- chartr("evi", " ", strings)` `gsub(" ", "", strings)` `[1] "P*01:01:051" "P*02:01:032"` — William, Nov 18 '16 at 21:08
@Sotos split then get 1st column? I will leave to community if this needs re-opening. `stringr::str_split_fixed(df1$V1,pattern = "e|v|i", n = 2)[, 1]` — zx8754, Nov 18 '16 at 21:14
Yeah I guess thats one way of doing it. So many dupes for these kind of questions — Sotos, Nov 18 '16 at 21:16
@Sotos Exactly my point, many many dupes, agreed target is not 100% dupe, but gives enough knowledge to go towards the right solution. — zx8754, Nov 18 '16 at 21:17
no one really addressed the error... @Mona I feel like you are misspelling the column name in your `gsub`, for example I get that error if I use `xdata$gene <-gsub("e*", "", xdata$dasdfaldfalasdfasd)` so for the example data, your code runs without error, but as pointed out you probably want `gsub('[evi].*', '', xdata$gene)` instead — rawr, Nov 18 '16 at 22:10
I spotted the error and edited the formula and it worked. FYI the formula is: > data$B_newY <-gsub("([evi]\\d+)", "", data$B_old, perl = TRUE) — Mona, Nov 18 '16 at 22:45
Also I have 10 columns of data but only want to apply the formula to 9 columns, any suggestions. — Mona, Nov 18 '16 at 22:52
`sub_fun <- function(x) gsub("[eiv].*", "", x); data[, -1] <- lapply(data[, -1], sub_fun)` should work — rawr, Nov 18 '16 at 23:32

Removing characters at the end of a string using R

0 Answers0