1

I have the following dataframe

ColumnA=c("Kuala Lumpur Sector 2 new","old Jakarta Sector31",    "Sector 9, 7 Hong Kong","Jakarta new Sector22")

and am extracting the Sector number to a separate column

gsub(".*Sector ?([0-9]+).*","\\1",ColumnA)

Is there a more elegant way to capture errors if 'Sector' does not appear on one line than an if else statement?

If the word 'Sector' does not appear on one line I simply want to set the value of that row to blank.

I thought of using str_detect first to see if 'Sector' was there TRUE/FALSE, but this is quite an ugly solution.

Thanks for any help.

RichS
  • 659
  • 12
  • 19
  • Note: Technically speaking, `ColumnA` is a character vector, not a data.frame. And this [question](http://stackoverflow.com/questions/32194088/extracting-number-from-text-string-referencing-specific-text) is the reference. –  Aug 26 '15 at 06:38
  • I guess you want this: `ColumnA=c("Kuala Lumpur 2 new","old Jakarta Sector31", "Sector 9, 7 Hong Kong","Jakarta new Sector22") gsub("^(?:.*Sector ?([0-9]+).*|.*)$","\\1",ColumnA)`. See [demo](http://ideone.com/ObaZTF). Right? – Wiktor Stribiżew Aug 26 '15 at 06:50

2 Answers2

3

If the word 'Sector' does not appear on one line I simply want to set the value of that row to blank.

To achieve that, use alternation operator |:

ColumnA=c("Kuala Lumpur 2 new","old Jakarta Sector31",    "Sector 9, 7 Hong Kong","Jakarta new Sector22")
gsub("^(?:.*Sector ?([0-9]+).*|.*)$","\\1",ColumnA)

Result: [1] "" "31" "9" "22" (as Kuala Lumpur 2 new has no Sector, the second part with no capturing group matched the whole string).

See IDEONE demo

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1
library(stringr)
as.vector(sapply(str_extract(ColumnA, "(?<=Sector\\s{0,10})([0-9]+)"),function(x) replace(x,is.na(x),'')))

I think this is what you need.

Stan Yip
  • 191
  • 8