I need to use a formula in a Google Sheets cel that will extract the FIRST brazilian phone number from a data in another cel. The phone number will be in the middle of a html code such as:
- Example 1:
<a href="tel:1123456789" class="btn-tel">(11) 2345-6789</a>
>>> Would return the first number:: 1123456789 - Example 2:
<strong>3003-1234</strong>
>>> Would return: 3003-1234 - Example 3:
<a _ngcontent-iqq-c82="" href="tel:30031234" class="ng-tns-c82-0">3003-1025 </a>
>>> Would return: 30031234
There are 3 different types of numbers:
- 0800abcdefg: where abcdefg could be numbers from 0 to 9 and this number could be presented as: 0800 abcdefg, 0800 abc defg or 0 800 abc defg.
- XXXXabcd: where XXXX could be 3003, 3004, 4003, 4004 or 4020 and abcd could be numbers from 0 to 9. This type of number could be presented as: XXXX-abcd, XXXXabcd, XXXX abcd or XXXX.abcd
- XXabcdefgh: where XX is a number that never has 0 (no 10,03,30...) and cannot be one of the following numbers: 23,25,26,29,36,39,52,56,57,58,59,72,76 or 78. The a number can be only 2-5 and b, c, d, e, f, g and h can be a number from 0 to 9. This number can be presented as: (XX)abcdefgh, (XX)abcd.efgh, (XX) abcd-efgh, XXabcdefgh, XX abcd efgh, XX.abcd.efgh, XX abcd-efgh
I tried using the following Regex with the formula REGEXTRACT in Google Sheets but it returns error saying that it is not a valid regex:
(?!23|25|26|29|36|39|52|56|57|58|59|72|76|78)(?:(((?)([1-9]{2})()?)( ?)[2-5])[0-9]{3}|3003|3004|4003|4004|4020|0( ?)800)(-?|.?| ?[0-9]{3})(-?|.?| ?)[0-9]{4}
Although it may not be an optimal regex, I have tried it on https://regex101.com/ and it worked just fine. BUt maybe it uses tokens not allowed in Google Sheets.
If that is not possible, the regex can consider that the numbers will be presented only as numbers, such as: 0800abcdefg,XXXXabcd or XXabcdefgh.