5

I have a grep puzzle that's eluding me: I'd like to remove the text following the final period in a collection of strings (i am using R, so perl syntax is available).

For example, say the string is ABCD.txt this grep would return ABCD, and if the text was abc.com.foo.bar, it would return abc.com.foo.

Any help greatfully appreciated (i don't think i can drink any more coffee!).

oguz ismail
  • 1
  • 16
  • 47
  • 69
ricardo
  • 8,195
  • 7
  • 47
  • 69

4 Answers4

10

Here are a few solutions:

sub("^(.*)[.].*", "\\1", "abc.com.foo.bar") # 1
## [1] "abc.com.foo"

library(tools)
file_path_sans_ext("abc.com.foo.bar") # 3
## [1] "abc.com.foo"

ADDED. Regarding your comment asking to remove leading periods, simplest is to just feed this into any of the above where x is the input string:

sub("^[.]*", "", x)

To do any of them in one line:

x <- c("abc.com.foo.bar", ".abc.com.foo.bar", ".vimrc")

sub("^[.]*(.*)[.]?.*$", "\\1", x) # 1a
## [1] "abc.com.foo.bar" "abc.com.foo.bar" "vimrc"          

file_path_sans_ext(sub("^[.]*", "", x))
## [1] "abc.com.foo" "abc.com.foo" "vimrc" 
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • is it too much to ask for a version that also trims leading periods? such that `.vimrc` becomes `vimrc`? (sorry, i didn't realise this case until you solved my major problem). – ricardo Jul 25 '13 at 01:19
  • 1
    add `\\.` after the `^`. – Justin Jul 25 '13 at 01:20
  • @G.Grothendieck: Thanks for another opportunity to upvote your insightful contributions. You taught me most of what I know about R-regex by way of your many postings to Rhelp. – IRTFM Jul 25 '13 at 01:25
  • @Justin -- thanks so much. working perfectly now. wish i'd asked earlier. – ricardo Jul 25 '13 at 01:25
  • why do you show an example with `abc.foo.bar` (#2)? it's definitly not what OP want (and actually it's useless for everyone) – vladkras Jul 25 '13 at 01:29
  • `sub("(.*[^.])[.][^.]+", "\\1", "abc.com.foo.bar")`, more readable I think. – lcn Jul 25 '13 at 04:36
  • @Justin -- i have another corner case: `.abc.com` ... i'd like to have `abc` returned. is this possible in one regex? atm i'm using two linked with an if statement. – ricardo Jul 25 '13 at 07:26
  • @ricardo, have provided some additional code to address your leading string query. – G. Grothendieck Jul 25 '13 at 10:18
3

And a non-regex answer for no reason whatsoever:

test <- c("abc.com.foo.bar","ABCD.txt")
sapply(strsplit(test,"\\."), function(x) paste0(head(x,-1),collapse=".") )
#[1] "abc.com.foo" "ABCD"
thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • 1
    To be completely accurate this is a simpler regex rather than a non-regex solution as `"\\."` is a regex. Using `strsplit(test, ".", fixed = TRUE)` would be a non-regex solution. – G. Grothendieck Nov 14 '16 at 15:02
2

You can use sub for example like this:

sub('(.*)[.](.*)','\\1',c('abc.com.foo.bar','ABCD.txt'))
[1] "abc.com.foo" "ABCD"  
agstudy
  • 119,832
  • 17
  • 199
  • 261
1

I cannot help you with r and I almost forgot perl, but this works both in JS (proof) and PHP

/\.[A-Za-z]+$/     -->    replace this with empty string ""
  ^    ^    ^
  |    |    |
  |    |    end of line
  |    only chars (you can add 0-9 if numbers are also present)
  dot before last chars

the syntax of regex is rather common, so I'm sure you can adopt it (maybe just get rid of /)

vladkras
  • 16,483
  • 4
  • 45
  • 55