3

I have the word 'fan(s)' I want to replace with the word fanatic(s) when preceeded by a pronoun verb combo seen below.

gsub(
    "(((s?he( i|')s)|((you|they|we)( a|')re)|(I( a|')m)).{1,20})(\\b[Ff]an)(s?\\b)", 
    '\\1\\2atic\\3', 
    'He\'s the bigest fan I know.', 
    perl = TRUE, ignore.case = TRUE
)

## [1] "He's the bigest He'saticHe's I know."

I know the numbered back references are refering to the inner parenthesis of the first group. Is there a way to have them refer to just the outter three parenthesis where the three groups are: (stuff before fan)(fan)(s\\b) in pseudocode.

I know my regex can replace wll the groups si I know it's valid. It's just the backreference portion.

gsub(
    "(((s?he( i|')s)|((you|they|we)( a|')re)|(I( a|')m)).{1,20})(\\b[Ff]an)(s?\\b)", 
    '', 
    'He\'s the bigest fan I know.', 
    perl = TRUE, ignore.case = TRUE
)

## [1] " I know."

Desired output:

## [1] "He's the bigest fanatic I know."

Examples of matches

inputs <- c(
    "He's the bigest fan I know.",
    "I am a huge fan of his.",
    "I know she has lots of fans in his club",
    "I was cold and turned on the fan",
    "An air conditioner is better than 2 fans at cooling."
)


outputs <- c(
    "He's the bigest fanatic I know.",
    "I am a huge fanatic of his.",
    "I know she has lots of fanatics in his club",
    "I was cold and turned on the fan",
    "An air conditioner is better than 2 fans at cooling."
)
ikegami
  • 367,544
  • 15
  • 269
  • 518
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519

1 Answers1

4

I understand you have trouble with the excessive amount of capturing groups. Turn those you are not interested in into non-capturing ones, or remove those that are plain redundant:

((?:s?he(?: i|')s|(?:you|they|we)(?: a|')re|I(?: a|')m).{1,20})\b(Fan)(s?)\b

See the regex demo

Note that [Ff] can be turned into F or f since you are uisng ignore.case=TRUE argument.

R demo:

gsub(
    "((?:s?he(?: i|')s|(?:you|they|we)(?: a|')re|I(?: a|')m).{1,20})\\b(fan)(s?)\\b", 
    '\\1\\2atic\\3', 
    inputs, 
    perl = TRUE, ignore.case = TRUE
)

Output:

[1] "He's the bigest fanatic I know."                     
[2] "I am a huge fanatic of his."                         
[3] "I know she has lots of fans in his club"             
[4] "I was cold and turned on the fan"                    
[5] "An air conditioner is better than 2 fans at cooling."
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563