0

I'm trying to improve with regex as I'm tired of constantly having to look up existing solutions instead of creating my own. Having a bit of difficulty understanding why this isn't working though:

Trying to extract both phone numbers from the following string (numbers and address are random):

+1-541-754-3010 156 Alphand_St. <J Steeve>\n 133, Green, Rd. <E Kustur> NY-56423 ;+1-541-914-3010\n"

So I'm using the following expression:

 /\+(.+)(?:\s|\b)/

These are the matches I'm getting back:

  1. 1-541-754-3010 156 Alphand_St.
  2. 1-541-914-3010

So I'm getting the last one correctly, but not the first one. Based on the expression, it should match anything from between a + and a space/boundary. But for some reason it's not stopping at the space after the first number. Am I going about this the wrong way?

Tim Baker
  • 35
  • 1
  • 7
  • First of all, your regex won't even return the matches you mentioned because you're using a [greedy match](https://www.regular-expressions.info/repeat.html). Second, you need to understand what a [word boundary (`\b`)](https://www.regular-expressions.info/wordboundaries.html) is (note that `-` IS a word boundary). And finally, you shouldn't really be using a [dot](https://www.regular-expressions.info/dot.html) when all you want to match is numbers and hyphens. – 41686d6564 stands w. Palestine Jul 29 '18 at 05:27
  • Also, add a language tag and show how you're applying the regex. – Mad Physicist Jul 29 '18 at 06:00

2 Answers2

0

In the format you provided for the search string, and since you are starting with a literal "+", I would just include the next following string of decimals and separators, like the hyphen:

/\+([0-9\-]+)/

Your ".+" says to match everything until there's a \s. However that also includes \s on the way to the \s.

  • Oh, I suppose that makes sense. Is there anyway to use .+ up until a specific character? Or to exclude a character from . – Tim Baker Jul 29 '18 at 05:28
  • Well, yes there is, with /\+(.+?)\s/ like spoken about below, the ? will make it lazy and it will try to match as few times as possible. I would still use searching for decimals like my answer and the one below. – Freddythunder Jul 29 '18 at 05:36
  • This (and CertainPerformance's answer as well) might (or might not) be sufficient for the OP's requirements, but be aware that it will also match things like `+----`, for example. @TimBaker Matching phone numbers [isn't that simple](https://stackoverflow.com/q/123559/4934172), _specially if you expect to match different formats_. If, however, all you have to match is one format, it might be simpler but you'd need to write something specifically for that format. Example: [`\+\d{1,2}-\d{3}-\d{3}-\d{4}`](https://regex101.com/r/cHVXgf/1). – 41686d6564 stands w. Palestine Jul 29 '18 at 05:43
0

Remember that dashes - are not word characters, so \b will match between, for example, 1- and -5 and so on. Also, your current regex is greedy - it'll try to match as many characters as it can with the repeated ., which is why it goes all the way to the end of the first line (because after the last character in the line matches \b). Making it lazy (with .+?) wouldn't fix it, though, because then it would terminate right after the 1 in 1-541 (because between 1- is a word boundary)

Try using a character set of digits and - instead:

\+([\d-]+)

https://regex101.com/r/ktbcHJ/1

CertainPerformance
  • 356,069
  • 52
  • 309
  • 320