Extracting number from text string referencing specific text

Question

I have a dataframe like the following:

ColumnA=c("Kuala Lumpur Sector 2 new","old Jakarta Sector31",
          "Sector 9, 7 Hong Kong","Jakarta new Sector22")
df1 <- data.frame(ColumnA)

from which I would like to extract the Sector in all instances, i.e.:

2,31,9,22

In all cases the number will be preceded by the word 'Sector'. However, there may or may not be a space before the number. Although not in the example above, there may also be other irrelevant numbers in the text string, which I want to ignore. The numbers all range from 1-30, so no 100s or above involved.

I'm afraid that my regular expression experience is almost nil. Help would be greatly appreciated. Also, for my future use, if there are any good regex guides specific to R, I would appreciate the heads-up.

score 3 · Accepted Answer · answered Aug 25 '15 at 01:22

3

For example using gsub and grouping:

gsub(".*Sector ?([0-9]+).*","\\1",ColumnA)
[1] "2"  "31" "9"  "22"

answered Aug 25 '15 at 01:22

agstudy

119,832
17
199
261

2

lovely! @RichS I highly recommend you check out [this](https://regex101.com/r/nC0eN6/1) for an explanation of what's going on. – MichaelChirico Aug 25 '15 at 01:26
@agstudy is there a straightforward way to catch those cases where we do not find the word 'Sector' and set these to blank? Many thanks! – RichS Aug 25 '15 at 07:08

Extracting number from text string referencing specific text

1 Answers1

Linked