1

Thanks to this answer over here. I have been using the following code to validate a URL. Its just that there are so many possible options with the new .anything domains lately. So I figured, that which ever the twitter treats as a URL(while posting a tweet), I will use the same... to follow the standard, so to say!

I want to know how the twitter validates a URL, is there any library that I could use which twitter is using. Please help me solve this common problem. Thanks a ton!

public static List<String> extractUrls(String input) {
    List<String> result = new ArrayList<String>();

    Pattern pattern = Pattern.compile(
        "(\\s)+\\b(((ht|f)tp(s?)\\:\\/\\/|~\\/|\\/)|(www.)?)" + 
        "(\\w+:\\w+)?(([-\\w]+\\.)+(com|org|net|gov" + 
        "|mil|biz|info|mobi|name|aero|jobs|museum|club" + 
        "|travel|[a-z]{2}))(:[\\d]{1,5})?" + 
        "(((\\/([-\\w~!$+|.,=]|%[a-f\\d]{2})+)+|\\/)+|\\?|#)?" + 
        "((\\?([-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?" + 
        "([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)" + 
        "(&(?:[-\\w~!$+|.,*:]|%[a-f\\d{2}])+=?" + 
        "([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)*)*" + 
        "(#([-\\w~!$+|.,*:=]|%[a-f\\d]{2})*)?\\b");

    Matcher matcher = pattern.matcher(input);
    while (matcher.find()) {
        result.add(matcher.group());
    }

    return result;
}
Community
  • 1
  • 1
codeMan
  • 5,730
  • 3
  • 27
  • 51

2 Answers2

2

As mentioned, you can use the Twitter text library. If you want to validate URLs, you can use the official list of TLDs - http://data.iana.org/TLD/tlds-alpha-by-domain.txt

Terence Eden
  • 14,034
  • 3
  • 48
  • 89
1

Twitter exposes twitter-text library which has a lot of text processing options. Here is the relevant repo https://github.com/twitter/twitter-text/tree/master/java. If you want to do this on client side, you can use code from https://github.com/twitter/twitter-text

Narendra Yadala
  • 9,554
  • 1
  • 28
  • 43