1

I am working on building an address matcher in R. I am stuck on matching unit address e.g "22/106 Homer Street". I want to be able to extract the 106

This is the correct regex : (?<=\/)\d+

Entering into R as

data$door_number <- str_extract(data$Property_Address,"(?<=\/)\\d+")`

comes out with Error:

'/' is an unrecognised escape in character string starting ""(?<=\/"

I have tried multiple combinations of slashes but cant seem to extract the desired result in R.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
12345667
  • 21
  • 2
  • Hi [Sheetal Bhundia](https://stackoverflow.com/users/10766923/sheetal-bhundia), please take a look at [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example), to modify your question, with a smaller sample taken from your data (check?dput()). Posting images of your data or no data makes it difficult to impossible for us to help you! – massisenergy Dec 24 '18 at 04:41
  • escape it, `str_extract("22/106 Homer Street", "(?<=/)\\d+")` – Ronak Shah Dec 24 '18 at 04:41
  • @RonakShah That duplicate is not specific enough, because I don't think the OP should even be using a lookaround here. – Tim Biegeleisen Dec 24 '18 at 04:45
  • @TimBiegeleisen Obviously, there are different ways to solve the problem but escaping the character gives OP's desired output. – Ronak Shah Dec 24 '18 at 05:01

2 Answers2

1

Here is an alternative:

   somestring<-c("22/106 Homer Street.")
      newstring<-sapply(strsplit(somestring,"/"),"[",2)
  myaddress<-sapply(str_extract_all(newstring,"\\d{3,}"),"[")
  [1] "106"

It might be less useful for a very large dataset.

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
0

I don't like your current approach, because only checking for a preceding backslash would match something like ABC/123, if it were to occur in some of your address strings. As variable width lookbehinds are not supported, I would recommend matching the full term. Using sub:

address <- "22/106 Homer Street"
sub(".*\\d/(\\d+).*", "\\1", address)

[1] "106"
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360