1

I have a dataframe like the following:

ColumnA=c("Kuala Lumpur Sector 2 new","old Jakarta Sector31",
          "Sector 9, 7 Hong Kong","Jakarta new Sector22")
df1 <- data.frame(ColumnA)

from which I would like to extract the Sector in all instances, i.e.:

2,31,9,22

In all cases the number will be preceded by the word 'Sector'. However, there may or may not be a space before the number. Although not in the example above, there may also be other irrelevant numbers in the text string, which I want to ignore. The numbers all range from 1-30, so no 100s or above involved.

I'm afraid that my regular expression experience is almost nil. Help would be greatly appreciated. Also, for my future use, if there are any good regex guides specific to R, I would appreciate the heads-up.

MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
RichS
  • 659
  • 12
  • 19

1 Answers1

3

For example using gsub and grouping:

gsub(".*Sector ?([0-9]+).*","\\1",ColumnA)
[1] "2"  "31" "9"  "22"
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • 2
    lovely! @RichS I highly recommend you check out [this](https://regex101.com/r/nC0eN6/1) for an explanation of what's going on. – MichaelChirico Aug 25 '15 at 01:26
  • @agstudy is there a straightforward way to catch those cases where we do not find the word 'Sector' and set these to blank? Many thanks! – RichS Aug 25 '15 at 07:08