0

Can someone please post a regex to extract domain from a url confirming RFC 1738 (http://www.ietf.org/rfc/rfc1738.txt)?

PROTOCOL://USERNAME:PASSWORD@DOMAINNAME:PORT/QUERYSTRING

Example:

https://abc:password@answers.yahoo.com:777/question/index?qid=20100728205639

Thanks, Sumit

user3802925
  • 81
  • 1
  • 5
  • I think you should go ahead and summarize the character sets for the specific parts of the URL for someone with Regex experience to put a correct pattern together.. – Oliver May 13 '11 at 23:00
  • Hi Michael, I found a post "http://stackoverflow.com/questions/3624651/c-url-parser-using-boost-regex-match" to extract domain name but the regex doesn't work for other types of urls like: – user3802925 May 13 '11 at 23:11
  • https://abc:password@answers.yahoo.com:777/question/index?qid=20100728205639 http://answers.yahoo.com:777/question/index https://abc:password@answers.yahoo.com/question/index?qid=20100728205639 http://answers.yahoo.com/question/index?qid=20100728205639 https://abc:password@answers.yahoo.com:777 – user3802925 May 13 '11 at 23:11
  • I need a regex that can take care of all these scenarios – user3802925 May 13 '11 at 23:12

1 Answers1

0

You can find one such regular expression here. You can probably simplify it, but that depends entirely on your needs.

You can also use a library which provides functions for parsing URLs. A good starting point is this Stack Overflow thread: Easy way to parse a url in C++ cross platform?

Community
  • 1
  • 1
Boaz Yaniv
  • 6,334
  • 21
  • 30