Why I should have to gsub \\1
instead of ""
A back-reference tells the engine to match the characters that were captured by a capturing group. A capturing group can be created by placing the characters to be grouped inside a set of parenthesis, ( ... )
. Every set of capturing parentheses from left to right gets assigned a number, whether or not the engine uses these parentheses when it evaluates the match.
In this case you need to use the back-reference \1
inside of the replacement call to assign the characters that were matched by Group 1 into the new string aa
. By using ""
instead, you're assigning aa
an empty value since the regular expression pattern matches the entire string.
I am also a bit confused by how the operators are being used ... brackets
The square brackets [ ... ]
you're asking about are called a character class which defines a set of characters. Saying — "match one of the characters specified by the class".
How I would recommend doing this:
In this example, a regular expression is not needed at all, you can simply split the string.
AAA <- 'ATGAS_1121'
strsplit(AAA, '_', fixed=T)[[1]][1]
# [1] "ATGAS"
And if you insist on using regular expression, you can use sub
as follows instead:
AAA <- 'ATGAS_1121'
sub('_.*', '', AAA)
# [1] "ATGAS"