-2

I have a data.frame with a column containing the customer digital paths (see below). In each row I would like to replace all the text between > and _referral by the word Referral.

For example the 3 rows below

bing_cpc>uswitch.com_referral
bing_cpc>money.co.uk_referral
bing_cpc>moneysupermarket.com_referral>google_organic>moneysupermarket.com_referral>google_cpc>google_cpc

should be

bing_cpc>Referral
bing_cpc>Referral
bing_cpc>Referral>google_organic>Referral>google_cpc>google_cpc

Any idea? Thanks

Alex Taylor
  • 8,343
  • 4
  • 25
  • 40
cheikh
  • 1
  • Welcome to SO! Please read [How to make a great R reproducible example?](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and edit your question according it. In order to help you we need example of your data posted using `dput()` function and example of wanted result. – pogibas Oct 04 '18 at 15:03
  • 2
    Did you try anything? Where exactly did you get stuck? Surely you found some useful resources when you googled "r string replace." – MrFlick Oct 04 '18 at 15:03
  • @cheikh; for you to understand the down votes. Everyone are here to help, but remember that you're in a community mainly formed by busy professionals asking they to use time and effort to solve your problem. Besides participate in the community and do the same, the best way to repay is to [make a good question](https://stackoverflow.com/help/how-to-ask). That not only benefit the site as a whole, but also helps you: working on a good question usually leads you to find a possible solution. – Carlos Eduardo Lagosta Oct 04 '18 at 18:22

2 Answers2

0

Try out:

df$col <- gsub(">.*referral", ">Referral", df$col)
SmitM
  • 1,366
  • 1
  • 8
  • 14
0

Your problem is trickier than it looks, so it deserves a detailed answer. First, lets put your example in a vector:

exStrg <- c(
  'bing_cpc>uswitch.com_referral',
  'bing_cpc>money.co.uk_referral',
  'bing_cpc>moneysupermarket.com_referral>google_organic>moneysupermarket.com_referral>google_cpc>google_cpc'
)

What you want is to replace everything that follows the pattern '>xxxxx_referral' to '>Referral'. gsub is the function for that and the immediate pattern would be '>.*_referral', the dot meaning "any character" and the asterisk meaning "occurring any times". But the * and + wildcards are greedy, so that's what happens:

> gsub(pattern = '>.*_referral', replacement = '>Referral', exStrg)
[1] "bing_cpc>Referral"                      
[2] "bing_cpc>Referral"                      
[3] "bing_cpc>Referral>google_cpc>google_cpc"

The expression will take anything between the first '>' and the last '_referral'. You can use ? to make the wildcard lazy; that will identify multiple occurrences of your pattern, but will still take everything in the middle:

> gsub('>.*?_referral', '>Referral', exStrg)
[1] "bing_cpc>Referral"                               
[2] "bing_cpc>Referral"                               
[3] "bing_cpc>Referral>Referral>google_cpc>google_cpc"

What you need instead is to indicate any subsequent '>' as a negated character:

> gsub('>[^>]*_referral', '>Referral', exStrg)
[1] "bing_cpc>Referral"                                              
[2] "bing_cpc>Referral"                                              
[3] "bing_cpc>Referral>google_organic>Referral>google_cpc>google_cpc"
  • Thanks a lot. It worked! – cheikh Oct 17 '18 at 15:57
  • Hi guys,UK|BP_Brand_Products_e_def > (unavailable) > UK|B_Brand_Pure Other_e_def > UK|B_Brand_Pure Only_e_def > UK|NB_Pure Only_e_def – cheikh Oct 29 '18 at 12:24
  • Hi Carlos, in the following how would I replace anything that follow the pattern > UK|B_Brandxxxxx with > Brand – cheikh Oct 29 '18 at 12:43
  • UK|BP_Brand_Products_e_def > (unavailable) > UK|B_Brand_Pure Other_e_def > UK|B_Brand_Pure Only_e_def > UK|NB_Pure Only_e_def – cheikh Oct 29 '18 at 12:43