I am using URLs as a key so I need them to be consistent and clean. I need a python function that will take a URL and clean it up so that I can do a get from the DB. For example, it will take the following:
example.com
example.com/
http://example.com/
http://example.com
http://example.com?
http://example.com/?
http://example.com//
and output a clean consistent version:
http://example.com/
I looked through std libs and GitHub and couldn't find anything like this
Update
I couldn't find a Python library that implements everything discussed here and in the RFC:
http://en.wikipedia.org/wiki/URL_normalization
So I am writing one now. There is a lot more to this than I initially imagined.