-2

I want a chrome browser like behavior where when a user enters a string, the browser decides whether to treat the string as a web address or Google query string.

How can I achieve this ?

I am trying,

boolean modifiedUrlValid = Patterns.WEB_URL.matcher(modifiedUrl).matches();

But this is not working on Android 5.0 and above. Please help.

Regards,

pcs
  • 1,864
  • 4
  • 25
  • 49

1 Answers1

6

This is a modified version from the original sources for Patterns, does this work? Use it the same way you used the original one.

public class URLValidator {

    public static final String TOP_LEVEL_DOMAIN_STR_FOR_WEB_URL =
                     "(?:"
                     + "(?:aero|arpa|asia|a[cdefgilmnoqrstuwxz])"
                     + "|(?:biz|b[abdefghijmnorstvwyz])"
                     + "|(?:cat|com|coop|c[acdfghiklmnoruvxyz])"
                     + "|d[ejkmoz]"
                     + "|(?:edu|e[cegrstu])"
                     + "|f[ijkmor]"
                     + "|(?:gov|g[abdefghilmnpqrstuwy])"
                     + "|h[kmnrtu]"
                     + "|(?:info|int|i[delmnoqrst])"
                     + "|(?:jobs|j[emop])"
                     + "|k[eghimnprwyz]"
                     + "|l[abcikrstuvy]"
                     + "|(?:mil|mobi|museum|m[acdeghklmnopqrstuvwxyz])"
                     + "|(?:name|net|n[acefgilopruz])"
                     + "|(?:org|om)"
                     + "|(?:pro|p[aefghklmnrstwy])"
                     + "|qa"
                     + "|r[eosuw]"
                     + "|s[abcdeghijklmnortuvyz]"
                     + "|(?:tel|travel|t[cdfghjklmnoprtvwz])"
                     + "|u[agksyz]"
                     + "|v[aceginu]"
                     + "|w[fs]"
                     + "|(?:\u03b4\u03bf\u03ba\u03b9\u03bc\u03ae|\u0438\u0441\u043f\u044b\u0442\u0430\u043d\u0438\u0435|\u0440\u0444|\u0441\u0440\u0431|\u05d8\u05e2\u05e1\u05d8|\u0622\u0632\u0645\u0627\u06cc\u0634\u06cc|\u0625\u062e\u062a\u0628\u0627\u0631|\u0627\u0644\u0627\u0631\u062f\u0646|\u0627\u0644\u062c\u0632\u0627\u0626\u0631|\u0627\u0644\u0633\u0639\u0648\u062f\u064a\u0629|\u0627\u0644\u0645\u063a\u0631\u0628|\u0627\u0645\u0627\u0631\u0627\u062a|\u0628\u06be\u0627\u0631\u062a|\u062a\u0648\u0646\u0633|\u0633\u0648\u0631\u064a\u0629|\u0641\u0644\u0633\u0637\u064a\u0646|\u0642\u0637\u0631|\u0645\u0635\u0631|\u092a\u0930\u0940\u0915\u094d\u0937\u093e|\u092d\u093e\u0930\u0924|\u09ad\u09be\u09b0\u09a4|\u0a2d\u0a3e\u0a30\u0a24|\u0aad\u0abe\u0ab0\u0aa4|\u0b87\u0ba8\u0bcd\u0ba4\u0bbf\u0baf\u0bbe|\u0b87\u0bb2\u0b99\u0bcd\u0b95\u0bc8|\u0b9a\u0bbf\u0b99\u0bcd\u0b95\u0baa\u0bcd\u0baa\u0bc2\u0bb0\u0bcd|\u0baa\u0bb0\u0bbf\u0b9f\u0bcd\u0b9a\u0bc8|\u0c2d\u0c3e\u0c30\u0c24\u0c4d|\u0dbd\u0d82\u0d9a\u0dcf|\u0e44\u0e17\u0e22|\u30c6\u30b9\u30c8|\u4e2d\u56fd|\u4e2d\u570b|\u53f0\u6e7e|\u53f0\u7063|\u65b0\u52a0\u5761|\u6d4b\u8bd5|\u6e2c\u8a66|\u9999\u6e2f|\ud14c\uc2a4\ud2b8|\ud55c\uad6d|xn\\-\\-0zwm56d|xn\\-\\-11b5bs3a9aj6g|xn\\-\\-3e0b707e|xn\\-\\-45brj9c|xn\\-\\-80akhbyknj4f|xn\\-\\-90a3ac|xn\\-\\-9t4b11yi5a|xn\\-\\-clchc0ea0b2g2a9gcd|xn\\-\\-deba0ad|xn\\-\\-fiqs8s|xn\\-\\-fiqz9s|xn\\-\\-fpcrj9c3d|xn\\-\\-fzc2c9e2c|xn\\-\\-g6w251d|xn\\-\\-gecrj9c|xn\\-\\-h2brj9c|xn\\-\\-hgbk6aj7f53bba|xn\\-\\-hlcj6aya9esc7a|xn\\-\\-j6w193g|xn\\-\\-jxalpdlp|xn\\-\\-kgbechtv|xn\\-\\-kprw13d|xn\\-\\-kpry57d|xn\\-\\-lgbbat1ad8j|xn\\-\\-mgbaam7a8h|xn\\-\\-mgbayh7gpa|xn\\-\\-mgbbh1a71e|xn\\-\\-mgbc0a9azcg|xn\\-\\-mgberp4a5d4ar|xn\\-\\-o3cw4h|xn\\-\\-ogbpf8fl|xn\\-\\-p1ai|xn\\-\\-pgbs0dh|xn\\-\\-s9brj9c|xn\\-\\-wgbh1c|xn\\-\\-wgbl6a|xn\\-\\-xkc2al3hye2a|xn\\-\\-xkc2dl3a5ee0h|xn\\-\\-yfro4i67o|xn\\-\\-ygbi2ammx|xn\\-\\-zckzah|xxx)"
                     + "|y[et]"
                     + "|z[amw]))";
    public static final String GOOD_IRI_CHAR = "a-zA-Z0-9\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF";
    public static final Pattern WEB_RUL = Pattern.compile("((?:(http|https|Http|Https|rtsp|Rtsp):\\/\\/(?:(?:[a-zA-Z0-9\\$\\-\\_\\.\\+\\!\\*\\'\\(\\)"
                    + "\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,64}(?:\\:(?:[a-zA-Z0-9\\$\\-\\_"
                    + "\\.\\+\\!\\*\\'\\(\\)\\,\\;\\?\\&\\=]|(?:\\%[a-fA-F0-9]{2})){1,25})?\\@)?)?"
                    + "((?:(?:[" + GOOD_IRI_CHAR + "][" + GOOD_IRI_CHAR + "\\-]{0,64}\\.)+"   // named host
                    + TOP_LEVEL_DOMAIN_STR_FOR_WEB_URL
                    + "|(?:(?:25[0-5]|2[0-4]" // or ip address
                    + "[0-9]|[0-1][0-9]{2}|[1-9][0-9]|[1-9])\\.(?:25[0-5]|2[0-4][0-9]"
                    + "|[0-1][0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(?:25[0-5]|2[0-4][0-9]|[0-1]"
                    + "[0-9]{2}|[1-9][0-9]|[1-9]|0)\\.(?:25[0-5]|2[0-4][0-9]|[0-1][0-9]{2}"
                    + "|[1-9][0-9]|[0-9])))"
                    + "(?:\\:\\d{1,5})?)" // plus option port number
                    + "(\\/(?:(?:[" + GOOD_IRI_CHAR + "\\;\\/\\?\\:\\@\\&\\=\\#\\~"  // plus option query params
                    + "\\-\\.\\+\\!\\*\\'\\(\\)\\,\\_])|(?:\\%[a-fA-F0-9]{2}))*)?"
                    + "(?:\\b|$)");

}

It was last updated by Google in 2011 meaning that it doesn't match a lot of domains that were recently added. See here for the current list of TLDs. You can keep adding the domains to the source above and it will ultimately have a negative effect on the performance of matching

EDIT: Don't use URLUtils, as it only validates if String starts with http:// or https://

boolean isValid = URLUtils.isValidUrl(urlString);
Bojan Kseneman
  • 15,488
  • 2
  • 54
  • 59
  • 1
    Are you sure ? Because it seems like URLUtils.isValidUrl gives things like "http://" as valid url (http://stackoverflow.com/a/27587741/4748828) – Samarth Agrawal May 25 '15 at 13:47
  • I just took a look a the source and unfortunately you are correct. It only validates if url starts with http:// or https:// This is a terrible way to validate an url. I can't imagine who would validate an url like that, shame on you whoever put this there. – Bojan Kseneman May 25 '15 at 14:59
  • Thanks @bojan-kseneman , I will try this today. I am intrigued as to how the phone's chrome browser is able to do this. If chrome browser works on the latest TLDs also then the only explanation I can think of is that Google is doing the url check on the server side.. but that would also mean even the correct urls typed into Chrome first go to the Google server and then rerouted to the original server.. – Samarth Agrawal May 26 '15 at 05:02
  • @SamarthAgrawal I don't think chrome is matching TLDs, try typing something like http://dsadakjsakdj.dskksajdas it will try to connect to it, tough I am pretty sure a TLD like that does not exist – Bojan Kseneman May 26 '15 at 06:47
  • IC. @bojan-kseneman .. may be our Chrome versions are different. On my phone I open Chrome and I go to dsadakjsakdj.dskksajdas . Somehow chrome understood that it is not a web url and showed me the search result.. Google knows!! Heres what I see (http://i.imgur.com/66grrc4.png). – Samarth Agrawal May 26 '15 at 09:16