How do I remove part of a string? For example in ATGAS_1121
I want to remove everything before _
.
6 Answers
Use regular expressions. In this case, you can use gsub
:
gsub("^.*?_","_","ATGAS_1121")
[1] "_1121"
This regular expression matches the beginning of the string (^), any character (.) repeated zero or more times (*), and underscore (_). The ? makes the match "lazy" so that it only matches are far as the first underscore. That match is replaced with just an underscore. See ?regex
for more details and references

- 118,240
- 47
- 247
- 360

- 173,410
- 32
- 338
- 418
-
6Previous regex would match to the last underscore in the case of, e.g., `gsub("^.*_","_","ATGAS_1121_xxx")`. Now fixed. – Richie Cotton Mar 14 '12 at 15:03
-
10@Joshua I find it really useful that you explained the role of the regular expressions. – Vasile Sep 17 '15 at 15:40
-
This also works with a vector of strings as the last argument. R is awesome like that. – naught101 Feb 14 '17 at 03:39
You can use a built-in for this, strsplit:
> s = "TGAS_1121"
> s1 = unlist(strsplit(s, split='_', fixed=TRUE))[2]
> s1
[1] "1121"
strsplit returns both pieces of the string parsed on the split parameter as a list. That's probably not what you want, so wrap the call in unlist, then index that array so that only the second of the two elements in the vector are returned.
Finally, the fixed parameter should be set to TRUE to indicate that the split parameter is not a regular expression, but a literal matching character.

- 69,080
- 24
- 165
- 199
If you're a Tidyverse kind of person, here's the stringr solution:
R> library(stringr)
R> strings = c("TGAS_1121", "MGAS_1432", "ATGAS_1121")
R> strings %>% str_replace(".*_", "_")
[1] "_1121" "_1432" "_1121"
# Or:
R> strings %>% str_replace("^[A-Z]*", "")
[1] "_1121" "_1432" "_1121"

- 18,687
- 19
- 90
- 138
Here's the strsplit
solution if s
is a vector:
> s <- c("TGAS_1121", "MGAS_1432")
> s1 <- sapply(strsplit(s, split='_', fixed=TRUE), function(x) (x[2]))
> s1
[1] "1121" "1432"

- 945
- 9
- 16
-
3Very helpful, thanks! FYI to get the first part of the string (i.e. before the '_'), replace the [2] on the end with a [1]. – stevenjoe Jan 06 '16 at 21:41
-
@verbamour do you know how to modify this to keep the first 2 elements of the string – d3hero23 Mar 16 '22 at 15:17
-
@d3hero23 I believe @stevenjoe answered that above. Applying his solution gives you `s1 <- sapply(strsplit(s, split='_', fixed=TRUE), function(x) (x[1]))` – verbamour Mar 24 '22 at 18:40
Maybe the most intuitive solution is probably to use the stringr
function str_remove
which is even easier than str_replace
as it has only 1 argument instead of 2.
The only tricky part in your example is that you want to keep the underscore but its possible: You must match the regular expression until it finds the specified string pattern (?=pattern)
.
See example:
strings = c("TGAS_1121", "MGAS_1432", "ATGAS_1121")
strings %>% stringr::str_remove(".+?(?=_)")
[1] "_1121" "_1432" "_1121"

- 6,437
- 1
- 45
- 53
Here the strsplit
solution for a dataframe using dplyr
package
col1 = c("TGAS_1121", "MGAS_1432", "ATGAS_1121")
col2 = c("T", "M", "A")
df = data.frame(col1, col2)
df
col1 col2
1 TGAS_1121 T
2 MGAS_1432 M
3 ATGAS_1121 A
df<-mutate(df,col1=as.character(col1))
df2<-mutate(df,col1=sapply(strsplit(df$col1, split='_', fixed=TRUE),function(x) (x[2])))
df2
col1 col2
1 1121 T
2 1432 M
3 1121 A

- 105
- 4