why I can't use x.unique() in numpy, however, x.sum() or x.mean() works?

Question

I'm learning numpy, however, I don't understand that, for example:

import numpy as np
ints = np.array([3,3,3,2,2,1,1,4,4])

ints.unique() # this won't work
np.unique(ints) # this works

however, some function works both ways

ints.sum()
np.sum(ints)

And I was reading numpy documents, what's the different between attributes vs methods? arributes will return something as well as methods.

score 1 · Accepted Answer · edited May 23 '17 at 12:34

unique unlike sum is a free function only and not a class (instance to be precise) method. The difference between the two is

obj.foo()   # instance method, obj is implicitly passed to foo()
foo(obj)    # free function,   obj is explicity passed to foo()

Have a look here for some explanation on different variants of methods. In NumPy, this is mainly a design decision, I believe, however there are certain reasons for some functions to be a free function. One reason that comes to mind, is that unlike in other technical languages (such as MATLAB), numpy arrays can be structured or unstructured and can be flexible in terms of containing objects of different types, for example

a = np.array([[1,2],[3,4]])        # structured array
b = np.array([[1,2],[3,4,5]])      # unstructured array
c = np.array([[1,2],["abc",True]]) # unstructured array with flexible data type

In such scenarios, having to make every function/method an instance method, would lead to confusing behaviour. Even the sum function behaves differently with structured and unstructured arrays

In [18]: a.sum() # sums all elements of the array
Out[18]: 10
In [19]: b.sum() # concatenates all elements of the array
Out[19]: [1, 2, 3, 4, 5]

In contrast, some functions like unique have a much narrower scope in terms of their applications. For example unique only works for structured arrays/buffers of uniform data type and operates on the flattened (1D dimensional) version of the arrays.

attributes of numpy arrays typically tell you about the underlying data type, shape, dimensionality, memory layout/strides and data ownership of the array, for instance:

In [20]: a=np.random.rand(3,4)
In [21]: a.flags
Out[21]: 
    C_CONTIGUOUS : True
    F_CONTIGUOUS : False
    OWNDATA : True
    WRITEABLE : True
    ALIGNED : True
    UPDATEIFCOPY : False

In [22]: a.shape
Out[22]: (3, 4)

In [23]: a.dtype
Out[23]: dtype('float64')

are all attributes and not array methods per say, in other words they are properties.

I've never noticed that function like sum will behave differently on different array. Thank you so much! — Daniel Yang, May 21 '17 at 13:36
Your `a` is a regular 2d numeric array (dtype int), `b` is an object dtype array, containing 2 lists (of different length). `np.sum(b)` and `b.sum()` use `list` add, which is a concatenate. `c` is a (2,2) array of dtype string. `sum` or `add.reduce` has not been implemented for that dtype. If `c` was created with length 2 and 3 lists, it too would be object dtype, and concatenate the lists. `[1, 2, 'abc', True, None]`. — hpaulj, May 21 '17 at 16:04
numpy object dtype arrays have a MATLAB counterpart - cell. They also are like Python lists. In numpy `structured array` usually refers to something else, an array with a compound `dtype` (earlier known as record arrays). — hpaulj, May 21 '17 at 16:07

hpaulj · Answer 2 · 2017-05-21T06:59:20.097

np.sum is a function that takes an array, or anything that can be turned into an array, and applies it's sum method. See np.source(np.sum) for details.

arr.sum is a method of the arr array. For a ndarray is compiled code. A subclassed array may have a different sum method.

Most of the cases where the are like-named functions and methods, a relationship like this holds.

Look at the source for np.unique to see a different design. One difference that comes to mind is that unique only works with 1d arrays, or with a flattened array. It's not as general purpose a method like sum or mean.

Some of these differences follow a pattern, or are explained, others are probably more the result of a development history. Often it is easier to add new functionality by writing a 'stand-alone' function, rather than adding a method to an existing class. The method is more closely integrated with the class.

To get into more details you'll have to spend time reading the development archives. For roughly that last 5 years, much of that can found by searching the respective github repository and its issues.

why I can't use x.unique() in numpy, however, x.sum() or x.mean() works?

2 Answers2