0

I am new to programming and started with R since I need it in my Master's courses can someone help me understand the solution step by step.

awards <- c("Won 1 Oscar.",
  "Won 1 Oscar. Another 9 wins & 24 nominations.",
  "1 win and 2 nominations.",
  "2 wins & 3 nominations.",
  "Nominated for 2 Golden Globes. 1 more win & 2 nominations.",
  "4 wins & 1 nomination.")

sub(".*\\s([0-9]+)\\snomination.*$", "\\1", awards)

Solution: A vector of character strings containing:

Won 1 Oscar., 24, 2, 3, 2, 1
Sotos
  • 51,121
  • 6
  • 32
  • 66

2 Answers2

2

The function call sub(".*\\s([0-9]+)\\snomination.*$", "\\1", awards) does the following:

In the entries of character vector awards it looks for a pattern of the form

  • some characters (optional) (.*)
  • followed by a space (\\s)
  • followed by a number ([0-9]+)
  • followed by a space (\\s)
  • followed by the word 'nomination' (nomination)
  • followed by some characters (optional) (.*)
  • followed by the end of the string ($)

If it can find such a pattern, then it replaces the entry with the number it found, if not, then it leaves the entry as it is.

Hence "Won 1 Oscar." stays as it is and "Won 1 Oscar. Another 9 wins & 24 nominations." is replaced by the number 24.

ikop
  • 1,760
  • 1
  • 12
  • 24
  • Is the parenthesis around [0-9]+ i.e "([0-9]+)" against "[0-9]+" make a difference? is it necessary for the parenthesis to be present so that the argument "\\1" takes the matched digits to replace the vector instead of the user specifying the replacement vector? – Rakesh Pandian Apr 13 '17 at 12:13
  • Exacly. The parentheses around `[0-9]` turn the pattern into a capturing group. The parentheses capture the text matched by the pattern inside it. This text can then be referenced using `\\1` for the first group, `\\2` for the second group (if there are more than one in your regular expression) etc. – ikop Apr 13 '17 at 12:17
0
[1]"Won 1 Oscar" 

no pattern match

[2]"24" 

matched whole vector element and 24 was identified by ([0-9]+) as group. "\\1" captures first group so the vector element is replaced by this group.

The remaining elements analogously

wolf_wue
  • 296
  • 1
  • 15