Im trying to come up with my own variant of an url regex to use in vba
this is what i currently have:
((https?\:\/\/)?([^\s\.\-]{1,}(?:(?:\.|\-)[^\s\.\-]{1,}){0,})(?=\.(?:[^\s]{1,}){0,2}\/|$)(\.ac|\.ad|\.ae|\.af|\.ag|\.ai|\.al|\.am|\.ao|\.aq|\.ar|\.as|\.at|\.au|\.aw|\.ax|\.az|\.ba|\.bb|\.bd|\.be|\.bf|\.bg|\.bh|\.bi|\.bj|\.bm|\.bn|\.bo|\.br|\.bs|\.bt|\.bw|\.by|\.bz|\.ca|\.cc|\.cd|\.cf|\.cg|\.ch|\.ci|\.ck|\.cl|\.cm|\.cn|\.co|\.cr|\.cu|\.cv|\.cw|\.cx|\.cy|\.cz|\.de|\.dj|\.dk|\.dm|\.do|\.dz|\.ec|\.ee|\.eg|\.es|\.et|\.eu|\.fi|\.fj|\.fk|\.fm|\.fo|\.fr|\.ga|\.gd|\.ge|\.gf|\.gg|\.gh|\.gi|\.gl|\.gm|\.gn|\.gp|\.gq|\.gr|\.gs|\.gt|\.gu|\.gw|\.gy|\.hk|\.hm|\.hn|\.hr|\.ht|\.hu|\.id|\.ie|\.il|\.im|\.in|\.io|\.iq|\.ir|\.is|\.it|\.je|\.jm|\.jo|\.jp|\.ke|\.kg|\.kh|\.ki|\.km|\.kn|\.kp|\.kr|\.kw|\.ky|\.kz|\.la|\.lb|\.lc|\.li|\.lk|\.lr|\.ls|\.lt|\.lu|\.lv|\.ly|\.ma|\.mc|\.md|\.me|\.mg|\.mh|\.mk|\.ml|\.mm|\.mn|\.mo|\.mp|\.mq|\.mr|\.ms|\.mt|\.mu|\.mv|\.mw|\.mx|\.my|\.mz|\.na|\.nc|\.ne|\.nf|\.ng|\.ni|\.nl|\.no|\.np|\.nr|\.nu|\.nz|\.om|\.pa|\.pe|\.pf|\.pg|\.ph|\.pk|\.pl|\.pm|\.pn|\.pr|\.ps|\.pt|\.pw|\.py|\.qa|\.re|\.ro|\.rs|\.ru|\.rw|\.sa|\.sb|\.sc|\.sd|\.se|\.sg|\.sh|\.si|\.sk|\.sl|\.sm|\.sn|\.so|\.sr|\.ss|\.st|\.su|\.sv|\.sx|\.sy|\.sz|\.tc|\.td|\.tf|\.tg|\.th|\.tj|\.tk|\.tl|\.tm|\.tn|\.to|\.tr|\.tt|\.tv|\.tw|\.tz|\.ua|\.ug|\.uk|\.us|\.uy|\.uz|\.va|\.vc|\.ve|\.vg|\.vi|\.vn|\.vu|\.wf|\.ws|\.ye|\.yt|\.za|\.zm|\.zw)(\/[^\s]{0,})?)
Currently I'm trying to match specific domain endings, because i want to exclude mobile app names (com.king.candycrushsodasaga should NOT be included for instance) It would be very good however, if i could use a more generic regex to achieve this goal since manually putting all those domain endings is not very effective/efficient
If there is a better way of doing so please let me know.
Appreciate any help.
Additional Info: I'm trying to use this for a excel, where i can drop a bunch of urls, including mobile apps (like com.king.candycrushsodasaga) into a table and match the actual websites in a different column to exclude non websites like mobile apps.
This is what the table looks like:
more background info:
i already have a vba function, which can be used as a formula. it takes in 2 arguments, one beeing the cell/range in which the url is and one beeing a range, in which the regex is. for some reason long strings throw "#value" so i had to split some regexes.
this is what the formula looks like:
=IF(IsMatch([@Url];RegularExps[URL Regex 1]);"Website";"Other")
I already tried to use regex'es (or regexi, whatever the plural for regex is) from this post: What is the best regular expression to check if a string is a valid URL?
But i havent been successful with any of them as they either include the app domains, throw #value or exclude valid urls