R Regex to identify and replace characters between multiple dots

Question

I have the following codes

"ABC.A.SVN.10.10.390.10.UDGGL"
"XYZ.Z.SVN.11.12.111.99.ASDDL"

and I need to replace the characters that exist between the 2nd and the 3rd dot. In this case it is SVN but it may well be any combination of between A and ZZZ, so really the only way to make this work is by using the dots.

The required outcome would be:

"ABC.A..10.10.390.10.UDGGL"
"XYZ.Z..11.12.111.99.ASDDL"

I tried variants of grep("^.+(\\.\\).$", "ABC.A.SVN.10.10.390.10.UDGGL") but I get an error.

Some examples of what I have tried with no success :

Link 1 Link 2

EDIT

I tried @Onyambu 's first method and I ran into a variant which I had not accounted for: "ABC.A.AB11.1.12.112.1123.UDGGL". In the replacement part, I also have numeric values. The desired outcome is "ABC.A..1.12.112.1123.UDGGL" and I get it using sub("\\.\\w+.\\B.",".",x) per the second part of his answer!

score 3 · Accepted Answer · answered Jan 26 '18 at 17:13

See code in use here

x <- c("ABC.A.SVN.10.10.390.10.UDGGL", "XYZ.Z.SVN.11.12.111.99.ASDDL")
sub("^(?:[^.]*\\.){2}\\K[^.]*", "", x, perl=T)

^ Assert position at the start of the line
(?:[^.]*\.){2} Match the following exactly twice
- [^.]*\. Match any character except . any number of times, followed by .
\K Resets the starting point of the pattern. Any previously consumed characters are no longer included in the final match
[^.]* Match any character except . any number of times

Results in [1] "ABC.A..10.10.390.10.UDGGL" "XYZ.Z..11.12.111.99.ASDDL"

This answer is the one I used as it is easier for me, as a non-expert in regex . I can also customize where `{2}` with the exactly twice element. — J. Doe., Feb 08 '18 at 10:57

Onyambu · Answer 2 · 2018-01-26T17:36:22.907

x= "ABC.A.SVN.10.10.390.10.UDGGL" "XYZ.Z.SVN.11.12.111.99.ASDDL" 
sub("([A-Z]+)(\\.\\d+)","\\2",x)

[1] "ABC.A..10.10.390.10.UDGGL" "XYZ.Z..11.12.111.99.ASDDL"

([A-Z]+) Capture any word that has the characters A-Z
(\\.\\d+) The captured word above, must be followed with a dot ie\\..This dot is then followed by numbers ie \\d+. This completes the capture.

so far the captured part of the string "ABC.A.SVN.10.10.390.10.UDGGL" is SVN.10 since this is the part that matches the regular expression. But this part was captured as SVN and .10. we do a backreference ie replace the whole SVN.10 with the 2nd part .10

Another logic that will work:

sub("\\.\\w+.\\B.",".",x)
[1] "ABC.A..10.10.390.10.UDGGL" "XYZ.Z..11.12.111.99.ASDDL"

score 1 · Answer 3 · answered Jan 26 '18 at 17:15

Not exactly regex but here is one more approach

#DATA
S = c("ABC.A.SVN.10.10.390.10.UDGGL", "XYZ.Z.SVN.11.12.111.99.ASDDL")

sapply(X = S,
       FUN = function(str){
           ind = unlist(gregexpr("\\.", str))[2:3]
           paste(c(substring(str, 1, ind[1]),
                   "SUBSTITUTION",
                   substring(str, ind[2], )), collapse = "")
       },
       USE.NAMES = FALSE)
#[1] "ABC.A.SUBSTITUTION.10.10.390.10.UDGGL" "XYZ.Z.SUBSTITUTION.11.12.111.99.ASDDL"

R Regex to identify and replace characters between multiple dots

3 Answers3