30

I am looking for something like java.net.URL in python-modules, Django, Zope or wherever in Python. I want it preferably from the semantics reason, because the result of analysis of concerned program implies that the URL plays an essential role in it. The consequence is that such URL class also will have great practical usage in that program.

Of course I could write such class on my own, but I'd like to look around before I start to reinvent the wheel.

I did look at urllib2 and urlparse. The urlparse basically has the functionality I need, but it doesn't encapsulate it into a class like java.net.URL. Regarding my analysis of my program it works upside-down.

I looked also into the source code of urlparse at the classes SplitResult and ParseResult. They have some basic functionality and they can be used for subclassing. But I'll have to rewrite rest of the urlparse functions as the subclass methods.

I found also mxURL - Flexible URL Datatype for Python. It is very close to what I really want. Only it seems to be quite an overkill for my purpose.

Can anyone suggest another option? Should I proceed with reinventing the wheel?

My solution:

To get my URL class I did basically two things:

  1. Inherit from urlparse.ResultMixin.
  2. Define function which only calls urlparse.urlparse() and transforms results to parameters of URL instance.
Sam Hartman
  • 6,210
  • 3
  • 23
  • 40
sumid
  • 1,871
  • 2
  • 25
  • 37
  • 1
    Why do you need a class for that? – Cat Plus Plus May 29 '11 at 20:46
  • 2
    @Cat Plus Plus: having a class for URLs can be very convenient. So convenient in fact, that the Python standard library includes one. – Fred Foo May 29 '11 at 21:30
  • @larsmans: It's not that much more than a named tuple, really. – Cat Plus Plus May 29 '11 at 21:44
  • 1
    @Cat Plus Plus: what more would you expect from a URL class? ;) – Fred Foo May 29 '11 at 21:46
  • 3
    I think having a URL class is a great idea. A URL is a value object much like any other. Maybe it's more Java philosophy than Python, but, for example, constructing one from a string then getting scheme, host, path etc is a very strong case. Looks like `urlparse` does this job fine, but doesn't undermine the case for the Java class. – Joe Jan 14 '12 at 20:38
  • @larsmans I would expect setting GET arguments using a `url[key] = value` syntax. – Ram Rachum May 31 '12 at 18:03

5 Answers5

25

urlparse does encapsulate URLs into a class, called ParseResult, so it can be considered a factory function for these. Straight from the Python docs:

>>> urlparse('http://www.cwi.nl:80/%7Eguido/Python.html')
ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html',
            params='', query='', fragment='')

If you desperately want a class called URL to encapsulate your URLs, use an alias (URL = urlparse.ParseResult) or create an adapter.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • 3
    ParseResult is namedtuple. I can't figure out why. I want to change existing url but there's no clear way to do this with urllib. – n0nSmoker Oct 08 '13 at 11:46
  • 1
    @n0nSmoker Now, in future we have `pathlib` in standard library which is immutable. Although you can easily produce new `Path` changed somehow. – George Sovetov Jan 16 '18 at 18:06
6

You might want consider having a look at furl because it might be an answer to your needs.

neutrinus
  • 1,879
  • 2
  • 16
  • 21
  • Thanks. I really like it! I'll give it a try ;-) – sumid Sep 11 '12 at 04:34
  • Downside of furl is that is doesn't (and [won't](https://github.com/gruns/furl/issues/15) ) handle [params](http://stackoverflow.com/questions/10988614/what-are-the-url-parameters-element-at-position-3-in-urlparse-result). But the question is who needs `params`. – sumid Jun 04 '13 at 21:26
  • @sumid from the [GitHub url](https://github.com/gruns/furl/issues/15) the maintainer says "If their use grows, I'll happily add them (or accept a pull request)." If you need params, I'm sure you could contribute it! :) – Wilfred Hughes Feb 12 '14 at 12:13
4

What we have as of 2018:

Only furl is being maintained today but its major disadvantage is that it's mutable, that doesn't encourage best practices, of course. (There is good modern reference — pathlib which consists of immutable classes.)

Overall, having a painless OO way to parse and construct URLs is graeat.

Update

yarl is worth looking at.

George Sovetov
  • 4,942
  • 5
  • 36
  • 57
4

~10 years late to the party here, but today, pydantic provides several URL types that might be helpful for validating, storing and passing around URLs; with type hints and mypy becoming more and more prevalent nowadays, some might consider this some kind of standard.

ssc
  • 9,528
  • 10
  • 64
  • 94
2

urlpath is my go-to for a URL object. It mirrors the pathlib Path object.

Steven
  • 1,733
  • 2
  • 16
  • 30