0

I want to extract a phone number from HTML using regex. I am using this regex

\d{4}\s\d{3}\s\d{3}

for phone number 1234 546 567. This regex successfully extracts the given phone number from HTML.

But a problem occurs when there is another number (which I don't want to extract) like this: 1234 567 89023. Now from this number it is also extracting 1234 567 890 but I don't want it to extract anything from that number.

Then I changed the regex to

^\d{4}\s\d{3}\s\d{3}$

but now it is not even extracting any valid numbers.

What should I do? Edited:

string MatchAusPhoneNumber = @"\D(\d{4}\s\d{3}\s\d{3})\D";
MatchCollection mathph2 = Regex.Matches(chk, MatchAusPhoneNumber);

foreach (Match matchio in mathph2)
{
    foreach (Capture capture in matchio.Captures)
    {
        if (my.ContainsKey(capture.Value) == false)
        {
            my.Add(capture.Value, capture.Value);
            mylist.Add(capture.Value);
        }
    }
}
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Nomi
  • 29
  • 8

1 Answers1

2

Edit: I just reread your question and it sounds like you want to extract groups of 4-3-3 digits from HTML. If this is the case, try a regex like this:

\D(\d{4}\s\d{3}\s\d{3})\D

\D will match everything except a digit. The () is to capture the actual phone number in the first capture group.


Old Answer: If you want for the last group of digits to be 3 to 5 characters, try this:

\d{4}\s\d{3}\s\d{3,5}

\d{3,5} means there can be 3 to 5 \ds.

tckmn
  • 57,719
  • 27
  • 114
  • 156