Your problem is trickier than it looks, so it deserves a detailed answer. First, lets put your example in a vector:
exStrg <- c(
'bing_cpc>uswitch.com_referral',
'bing_cpc>money.co.uk_referral',
'bing_cpc>moneysupermarket.com_referral>google_organic>moneysupermarket.com_referral>google_cpc>google_cpc'
)
What you want is to replace everything that follows the pattern '>xxxxx_referral' to '>Referral'. gsub
is the function for that and the immediate pattern would be '>.*_referral', the dot meaning "any character" and the asterisk meaning "occurring any times". But the *
and +
wildcards are greedy, so that's what happens:
> gsub(pattern = '>.*_referral', replacement = '>Referral', exStrg)
[1] "bing_cpc>Referral"
[2] "bing_cpc>Referral"
[3] "bing_cpc>Referral>google_cpc>google_cpc"
The expression will take anything between the first '>' and the last '_referral'. You can use ?
to make the wildcard lazy; that will identify multiple occurrences of your pattern, but will still take everything in the middle:
> gsub('>.*?_referral', '>Referral', exStrg)
[1] "bing_cpc>Referral"
[2] "bing_cpc>Referral"
[3] "bing_cpc>Referral>Referral>google_cpc>google_cpc"
What you need instead is to indicate any subsequent '>' as a negated character:
> gsub('>[^>]*_referral', '>Referral', exStrg)
[1] "bing_cpc>Referral"
[2] "bing_cpc>Referral"
[3] "bing_cpc>Referral>google_organic>Referral>google_cpc>google_cpc"