2

I have a regex edge case that I am unable to solve. I need to grep to remove the leading period (if it exists) and the text following the last period (if it exists) from a string.

That is, given a vector:

x <- c("abc.txt", "abc.com.plist", ".abc.com")

I'd like to get the output:

[1] "abc"     "abc.com" "abc"

The first two cases are solved already I obtained help in this related question. However not for the third case with leading .

I am sure it is trivial, but i'm not making the connections.

Community
  • 1
  • 1
ricardo
  • 8,195
  • 7
  • 47
  • 69
  • 1
    it's not actually a duplicate as the answers do not solve the case `.abc.com` to `abc` ... i wish that they did. perhaps i've made some error copying, and if so i'll delete this qn -- please advise – ricardo Jul 25 '13 at 08:31
  • 2
    I just realised that. Here's one that'll do as expected: `sub("^[.]*(.*)[.].*$", "\\1", x)`. I'll vote to reopen. – Arun Jul 25 '13 at 08:32
  • I've voted to reopen, but I still think this could be reasonably incorporated into the other question. – Thomas Jul 25 '13 at 08:34
  • 1
    Thomas, I agree, although he has left a comment and had no answer. I don't blame him. But @ricardo, link to the other question and say why you dont find the answer complete or how this question is different from that one to avoid such confusions. – Arun Jul 25 '13 at 08:35
  • I have edited to reflect the edge case and have linked to the other qn. thanks folks. – ricardo Jul 25 '13 at 08:37

1 Answers1

4

This regex does what you want:

^\.+|\.[^.]*$

Replace its matches with the empty string.

In R:

gsub("^\\.+|\\.[^.]*$", "", subject, perl=TRUE);

Explanation:

^      # Anchor the match to the start of the string
\.+    # and match one or more dots
|      # OR
\.     # Match a dot
[^.]*  # plus any characters except dots
$      # anchored to the end of the string.
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • `perl=TRUE` is not strictly necessary here, though it doesn't hurt either. – Brian Diggs Jul 25 '13 at 16:46
  • +1 / accepted. thanks very much -- that's perfect. Also, i really appreciate you making the effort to add the explanation. I'm in the steepest part of the regex learning curve. – ricardo Jul 25 '13 at 20:13