24

I'm coming from Java and learning Python. So far what I found very cool, yet very hard to adapt, is that there's no need to declare types. I understand that each variable is a pointer to an object, but so far I'm not able to understand how to design my code then.

For example, I'm writing a function that accepts a 2D NumPy array. Then in the body of the function I'm calling different methods of this array (which is an object of array in Numpy). But then in the future suppose I want to use this function, by that time I might have forgotten totally what I should pass to the function as a type. What do people normally do? Do they just write documentation for this? Because if that is the case, then this involves more typing and would raise the question about the idea of not declaring the type.

Also suppose I want to pass an object similar to an array in the future. Normally in Java one would implement an interface and then let both classes to implement the methods. Then in the function parameters I define the variable to be of the type of the interface. How can this issue be solved in Python or what approaches can be used to make the same idea?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jack Twain
  • 6,273
  • 15
  • 67
  • 107
  • 1
    The purpose of duck typing is not to write less code in the first place – Niklas R Mar 02 '14 at 13:55
  • 1
    If your code actually, for some reason, really depends on a specific type passed in, you can use `assert isinstance(foo, Foo)` as the first line of the function/method (this also serves as documentation when reading the code), but that often really just limits what can be done with that function later. – Erik Kaplun Mar 02 '14 at 17:57
  • 4
    @ErikAllik: don't assert on that, raise TypeError – Neil G Mar 02 '14 at 21:38
  • @NeilG: well, yeah, true; depends tho; sometimes you want `assert`ions to make sure you've understood your own code correctly, but of course that's more general than parameter types anyway. – Erik Kaplun Mar 02 '14 at 23:57
  • @ErikAllik: https://mail.python.org/pipermail/python-list/2013-November/660401.html – Neil G Mar 02 '14 at 23:59
  • Yea, I know all that. – Erik Kaplun Mar 03 '14 at 00:31
  • Specifically, it says, "assertions should be used for ... * runtime checks on program logic; * checking contracts (e.g. pre-conditions and post-conditions);". As long as it's a defined requirement that the caller must pass in a 2-D `numpy` array, then asserting that pre-condition would fall into both of those categories. Sure, in principle you should contract that anything that walks like a `numpy` array is good enough (e.g. to allow mocks and whatnot), but I think that issue is separate from the question of which exception to raise given you are testing it. – Steve Jessop Mar 03 '14 at 01:26
  • ... SO you should raise `TypeError` as a defined behavior in response to a type your function cannot use, when you have contracted to check the type for the caller and inform them whether it is correct or not. If you have not guaranteed to perform the check, then a failed assertion is adequate (as is almost any other exception, especially if it results from trying to use the object as if it were the required type -- `AttributeError` commonly crops up). There's an important difference between a check that implements guaranteed behavior of the function vs. a check of pre-conditions. – Steve Jessop Mar 03 '14 at 01:34
  • @SteveJessop: I don't agree that assert is ever appropriate in this case. It is much more useful to raise the right kind of exception. Assert should mean that something is fundamentally wrong with the library code — not that the client code sent to the library some incorrect input. You would not expect numpy to assert on invalid arguments. – Neil G Mar 03 '14 at 04:32
  • @NeilG: I understand what you're saying and I'm disagreeing with it. The source you chose to cite specifically mentions checking pre-conditions as a suitable use for assert, and I agree with it on that point. If you don't agree then you should probably find a different source closer to your preferred style, so as to give advice closer to what you want to say :-) – Steve Jessop Mar 03 '14 at 09:13
  • @SteveJessop: The preconditions that AssertionError should be raised for exclude those conditions that other kinds of exceptions should be raised for like TypeError. I thought Steve made that clear in his introduction: "Many people use asserts as a quick and easy way to raise an exception if an argument is given the wrong value. But this is wrong, dangerously wrong, for two reasons. The first is that AssertionError is usually the wrong error to give when testing function arguments. You wouldn't write code like this:…" – Neil G Mar 03 '14 at 18:19
  • Anyway, I suspect we have some basic misunderstanding of context here because it doesn't seem at all complicated or controversial to me to say one can assert the truth of preconditions. There do not need to be special kinds of precondition violations with their own exceptions, applicable to all Python programmers everywhere. Of course if the function is *defined* to throw a particular exception when the wrong type is passed in, then you would choose `TypeError` not `AssertionError`. Then the type of the argument *is not a precondition*, it's a case the function must handle per spec. – Steve Jessop Mar 03 '14 at 21:52

6 Answers6

32

This is a very healthy question.

Duck typing

The first thing to understand about python is the concept of duck typing:

If it walks like a duck, and quacks like a duck, then I call it a duck

Unlike Java, Python's types are never declared explicitly. There is no restriction, neither at compile time nor at runtime, in the type an object can assume.

What you do is simply treat objects as if they were of the perfect type for your needs. You don't ask or wonder about its type. If it implements the methods and attributes you want it to have, then that's that. It will do.

def foo(duck):
    duck.walk()
    duck.quack()

The only contract of this function is that duck exposes walk() and quack(). A more refined example:

def foo(sequence):
    for item in sequence:
        print item

What is sequence? A list? A numpy array? A dict? A generator? It doesn't matter. If it's iterable (that is, it can be used in a for ... in), it serves its purpose.

Type hinting

Of course, no one can live in constant fear of objects being of the wrong type. This is addressed with coding style, conventions and good documentation. For example:

  • A variable named count should hold an integer
  • A variable Foo starting with an upper-case letter should hold a type (class)
  • An argument bar whose default value is False, should hold a bool too when overridden

Note that the duck typing concept can be applied to to these 3 examples:

  • count can be any object that implements +, -, and <
  • Foo can be any callable that returns an object instance
  • bar can be any object that implements __nonzero__

In other words, the type is never defined explicitly, but always strongly hinted at. Or rather, the capabilities of the object are always hinted at, and its exact type is not relevant.

It's very common to use objects of unknown types. Most frameworks expose types that look like lists and dictionaries but aren't.

Finally, if you really need to know, there's the documentation. You'll find python documentation vastly superior to Java's. It's always worth the read.

salezica
  • 74,081
  • 25
  • 105
  • 166
  • 5
    This is a very healthy answer. – Maxime Lorant Mar 02 '14 at 13:45
  • what does healthy question mean in the first place? :p – Jack Twain Mar 02 '14 at 13:58
  • @AlexTwain In my opinion: "very interesting and good to ask. It's not a silly question about syntax error but a real question about the philosophy of Python that every Python programmer should know" – Maxime Lorant Mar 02 '14 at 14:00
  • OK now I got the time to read. The answer assumes that there are always approperiate names. If you consider my example, what would you call the 2D array then? 2d_numpy_array? – Jack Twain Mar 02 '14 at 14:35
  • `array` is just fine. Remember that it's how you *use* the variable that matters, the interface you assume it to implement. If you find a piece of code confusing, comment it: `# numpy.array`, but until you feel the need, just leave it be. It takes a while to get used to this coming from Java, be patient – salezica Mar 02 '14 at 14:52
  • @uʍopǝpısdn: The question is, how do you know, without reading the body of the function, what is required of the argument being passed in? (Whether it is type or capabilities is a different matter.) You're saying that all the information needed should be stuffed into the variable name, but that seems impractical. In practice, I think documentation becomes necessary in Python. – ShreevatsaR Mar 02 '14 at 17:01
  • I never said that "all the information should be stuffed into the variable name". The examples above are quite simple, and they address simple needs. For complex interfaces, documentation is the only way -- but this is true of both python and java – salezica Mar 02 '14 at 19:17
  • @uʍopǝpısdn: Right, I think that's the answer to the OP's question: even in circumstances where the input could be described by the type in a good typed language, it must instead be described in a line of documentation in Python. – ShreevatsaR Mar 02 '14 at 19:56
  • 7
    "A variable named `count` should hold an integer" -- or a European aristocrat. I joke of course, but actually in practice one does need to be careful in "self-documenting" code about real ambiguities or imprecise names. – Steve Jessop Mar 03 '14 at 01:12
  • 1
    Sometimes duck typing doesn't work out. If your function expects a sequence of strings and you pass it a string, your function will just end up iterating over the individual characters. – Gabe Mar 03 '14 at 05:45
7

I've reviewed a lot of Python code written by Java and .Net developers, and I've repeatedly seen a few issues I might warn/inform you about:

Python is not Java

Don't wrap everything in a class:

Seems like even the simplest function winds up being wrapped in a class when Java developers start writing Python. Python is not Java. Don't write getters and setters, that's what the property decorator is for.

I have two predicates before I consider writing classes:

  1. I am marrying state with functionality
  2. I expect to have multiple instances (otherwise a module level dict and functions is fine!)

Don't type-check everything

Python uses duck-typing. Refer to the data model. Its builtin type coercion is your friend.

Don't put everything in a try-except block

Only catch exceptions you know you'll get, using exceptions everywhere for control flow is computationally expensive and can hide bugs. Try to use the most specific exception you expect you might get. This leads to more robust code over the long run.

Learn the built-in types and methods, in particular:

From the data-model

str

  • join
  • just do dir(str) and learn them all.

list

  • append (add an item on the end of the list)
  • extend (extend the list by adding each item in an iterable)

dict

  • get (provide a default that prevents you from having to catch keyerrors!)
  • setdefault (set from the default or the value already there!)
  • fromkeys (build a dict with default values from an iterable of keys!)

set

Sets contain unique (no repitition) hashable objects (like strings and numbers). Thinking Venn diagrams? Want to know if a set of strings is in a set of other strings, or what the overlaps are (or aren't?)

  • union
  • intersection
  • difference
  • symmetric_difference
  • issubset
  • isdisjoint

And just do dir() on every type you come across to see the methods and attributes in its namespace, and then do help() on the attribute to see what it does!

Learn the built-in functions and standard library:

I've caught developers writing their own max functions and set objects. It's a little embarrassing. Don't let that happen to you!

Important modules to be aware of in the Standard Library are:

  • os
  • sys
  • collections
  • itertools
  • pprint (I use it all the time)
  • logging
  • unittest
  • re (regular expressions are incredibly efficient at parsing strings for a lot of use-cases)

And peruse the docs for a brief tour of the standard library, here's Part 1 and here's Part II. And in general, make skimming all of the docs an early goal.

Read the Style Guides:

You will learn a lot about best practices just by reading your style guides! I recommend:

Additionally, you can learn great style by Googling for the issue you're looking into with the phrase "best practice" and then selecting the relevant Stackoverflow answers with the greatest number of upvotes!

I wish you luck on your journey to learning Python!

cirrusio
  • 580
  • 5
  • 28
Russia Must Remove Putin
  • 374,368
  • 89
  • 403
  • 331
2

For example I'm writing a function that accepts a 2D Numpy array. Then in the body of the function I'm calling different methods of this array (which is an object of array in Numpy). But then in the future suppose I want to use this function, by that time I might forgot totally what should I pass to the function as a type. What do people normally do? Do they just write a documentation for this?

You write documentation and name the function and variables appropriately.

def func(two_d_array): 
    do stuff

Also suppose I want in the future to pass an object similar to an array, normally in Java one would implement an interface and then let both classes to implement the methods.

You could do this. Create a base class and inherit from it, so that multiple types have the same interface. However, quite often, this is overkill and you'd simply use duck typing instead. With duck typing, all that matters is that the object being evaluated defines the right properties and methods required to use it within your code.

Note that you can check for types in Python, but this is generally considered bad practice because it prevents you from using duck typing and other coding patterns enabled by Python's dynamic type system.

Chinmay Kanchi
  • 62,729
  • 22
  • 87
  • 114
  • 2
    Your first argument is weakened by the fact that your example is invalid syntax, and when fixed remains (1) ugly and (2) "Systems Hungarian notation" with all its disadvantages. It's true that useful type information can be conveyed in names, but the names should still be meaningful beyond that. –  Mar 02 '14 at 13:37
  • Invalid syntax? You mean the `do stuff` line? I believe that bit is self-explanatory... As for your second point, the example is intended to be very generic. I would hope that the intention was clear here. – Chinmay Kanchi Mar 02 '14 at 13:40
  • No, I refer to `2d_array`, which is not an identifier. –  Mar 02 '14 at 13:40
  • Doh! Fixed. Good catch! – Chinmay Kanchi Mar 02 '14 at 13:41
  • thx for pointing out! I forgot the "consenting adults" saying ;P – zhangxaochen Mar 02 '14 at 14:16
1

Yes, you should document what type(s) of arguments your methods expect, and it's up to the caller to pass the correct type of object. Within a method, you can write code to check the types of each argument, or you can just assume it's the correct type, and rely on Python to automatically throw an exception if the passed-in object doesn't support the methods that your code needs to call on it.

The disadvantage of dynamic typing is that the computer can't do as much up-front correctness checking, as you've noted; there's a greater burden on the programmer to make sure that all arguments are of the right type. But the advantage is that you have much more flexibility in what types can be passed to your methods:

  • You can write a method that supports several different types of objects for a particular argument, without needing overloads and duplicated code.
  • Sometimes a method doesn't really care about the exact type of an object as long as it supports a particular method or operation — say, indexing with square brackets, which works on strings, arrays, and a variety of other things. In Java you'd have to create an interface, and write wrapper classes to adapt various pre-existing types to that interface. In Python you don't need to do any of that.
Wyzard
  • 33,849
  • 3
  • 67
  • 87
0

You can use assert to check if conditions match:

In [218]: def foo(arg):
     ...:     assert type(arg) is np.ndarray and np.rank(arg)==2, \
     ...:         'the argument must be a 2D numpy array'
     ...:     print 'good arg'

In [219]: foo(np.arange(4).reshape((2,2)))
good arg

In [220]: foo(np.arange(4))
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-220-c0ee6e33c83d> in <module>()
----> 1 foo(np.arange(4))

<ipython-input-218-63565789690d> in foo(arg)
      1 def foo(arg):
      2     assert type(arg) is np.ndarray and np.rank(arg)==2, \
----> 3         'the argument must be a 2D numpy array'
      4     print 'good arg'

AssertionError: the argument must be a 2D numpy array

It's always better to document what you've written completely as @ChinmayKanchi mentioned.

zhangxaochen
  • 32,744
  • 15
  • 77
  • 108
  • 5
    While technically correct and occasionally appropriate, type checking should always be used as a last resort in Python. If an object behaves the same as a numpy array as far as the function is concerned, the actual type hierarchy should not matter. – Chinmay Kanchi Mar 02 '14 at 13:37
  • 1
    @ChinmayKanchi how do you check if the argument behaves *totally* the same as numpy arrays? – zhangxaochen Mar 02 '14 at 13:42
  • 3
    You don't, unless you have a specific reason to expect that a function may receive invalid arguments under normal operation of the program. You just assume that the caller knows what s/he is doing and let Python throw a `TypeError`/`KeyError`/`NameError` when the caller does something unexpected. – Chinmay Kanchi Mar 02 '14 at 13:43
  • @zhangxaochen Typically, you don't. For some cases, there are abstract base classes. Note that you a fair comparison to static type systems needs to be lenient with what counts into "totally the same behavior" because static type systems check some invariants, but far from all (e.g. exceptions thrown, *interpretation* of arguments). –  Mar 02 '14 at 13:43
  • I'd argue that most functions do however operate on either one type or a very limited set of types, in practice. – Erik Kaplun Mar 02 '14 at 17:59
  • No, you should never use assertions to verify type. Raise a TypeError. – Neil G Mar 02 '14 at 21:35
0

Here are a few pointers that might help you make your approach more 'Pythonic'.

The PEPs

In general, I recommend at least browsing through the PEPs. It helped me a lot to grok Python.

Pointers

Since you mentioned the word pointers, Python doesn't use pointers to objects in the sense that C uses pointers. I am not sure about the relationship to Java. Python uses names attached to objects. It's a subtle but important difference that can cause you problems if you expect similar-to-C pointer behavior.

Duck Typing

As you said, yes, if you are expecting a certain type of input you put it in the docstring.

As zhangxaochen wrote, you can use assert to do realtime typing of your arguments, but that's not really the python way if you are doing it all the time with no particular reason. As others mentioned, it's better to test and raise a TypeError if you have to do this. Python favors duck typing instead - if you send me something that quacks like a numpy 2D array, then that's fine.

Community
  • 1
  • 1
KobeJohn
  • 7,390
  • 6
  • 41
  • 62
  • Actually, as far as analogies with other language's concepts go, "pointers to objects" is pretty damn good and I'm not aware of any misunderstandings or problems caused by it. But really, Python objects behave almost identically to Java objects in the sense of "names attacked to objects". –  Mar 02 '14 at 13:38
  • 1
    In case you haven't seen it, have a look at [this question](http://stackoverflow.com/questions/986006/python-how-do-i-pass-a-variable-by-reference) and the miles of comments and you can see that there is actually quite a lot of disagreement there. Coming from a C background, it was an important step for me to differentiate between pointers and names. Comparing to C, names certainly are not pointers. I don't know how other languages use the word 'pointers' though. – KobeJohn Mar 02 '14 at 13:41
  • It's true that Python's names are more restricted, there is no equivalent to a pointer-to-pointer (let alone deeper nesting). But aside from that, a name behaves like a pointer to an abstract `struct PyObject`, which is hardly surprising because that's *exactly* how they are implemented. –  Mar 02 '14 at 13:42
  • zhangxaochen's answer is incorrect. Don't use assertions this way. – Neil G Mar 02 '14 at 21:37