-2

I have a column of strings that I would like to remove everything after the last '.'

I tried:

sub('\\..*', '', x)

But my problem is is that for some of the stings there are x2 '.' and for some only x1 '.' eg

ENST00000338167.9
ABCDE.42927.6

How can I remove only characters after the last '.'?? So that I'm left with:

ENST00000338167
ABCDE.42927

Many thanks!!

G5W
  • 36,531
  • 10
  • 47
  • 80
zoe
  • 301
  • 3
  • 11

2 Answers2

4

We can use sub to match the . (escaped as it is a metacharacter for any character) followed by 0 or more characters that are not a . ([^.]*) until the end ($) of the string and replace it with blank ("")

sub("\\.[^.]*$", "", x)
#[1] "ENST00000338167" "ABCDE.42927"    

Or use str_remove from stringr

library(stringr)
str_remove(x, "\\.[^.]*$")
#[1] "ENST00000338167" "ABCDE.42927"  

data

x <- c("ENST00000338167.9", "ABCDE.42927")  
akrun
  • 874,273
  • 37
  • 540
  • 662
2

Yet another way is by "capturing" the part before.

sub("(.*)\\..*", "\\1", x)
G5W
  • 36,531
  • 10
  • 47
  • 80