0

When using Python, I am confronted with a problem confusing me for a long time. Say, I use numpy to define an array x = np.array([1, 2]).

This, I think, means that x is an instance of class array. Moreover, the tutorial also says that [1,2] is actually stored in x.data. But I get data [1,2] through the instance name x instead of x.data in Python.

How does this happen? There is a link between the instance name x and x.data?

ayhan
  • 70,170
  • 20
  • 182
  • 203
shuairenqin
  • 103
  • 1
  • 8

1 Answers1

4

x and x.data are different types though they are interpreting data from the same location in memory

In [1]: import numpy as np

In [2]: x = np.array([1,2])

In [3]: type(x)
Out[3]: numpy.ndarray

In [4]: type(x.data)
Out[4]: buffer

x.data is a pointer to the underlying buffer of bytes that composes the array object in memory, referenced here in the numpy docs.

When we check the underlying datatype (dtype) the array is storing the data as we see the following:

In [5]: x.dtype
Out[5]: dtype('int64')

An int64 is composed of 64 bits or 8 bytes (8 bits in a byte). This means the underlying buffer of x, x.data should be a buffer of length 16. We confirm that here:

In [6]: len(x.data)
Out[6]: 16

Lastly, we can peek into the actual values of the buffer to see how Python is storing the values in memory:

In [7]: for i in range(len(x.data)): print ord(x.data[i])
1
0
0
0
0
0
0
0
# first 8 bytes above, second 8 below
2
0
0
0
0
0
0
0

We use ord to return the value of the byte since numpy is storing the value as an 8 bit (1 byte) string.

Since, each of these bytes only stores 8 bits of information, none of the above values printed by the loop will never exceed 255, the maximum value of a byte.

The link between x and x.data is that x.data points to the location in memory of the values you see when you inspect x. numpy uses the ndarray type as an abstraction on top of this lower level storage in memory to make it easy to deal with arrays at a high level, like getting the value of x at index one:

In [8]: x[1]
Out[8]: 2

instead of needing to implement the correct offsetting and binary to integer conversion yourself.

Daniel Corin
  • 1,987
  • 2
  • 15
  • 27
  • wonderful answers. I appreciate them. But the explanation about the link between the instance name "x" and the x's attribute x.data confuses me still. Maybe, I should gives an additional example confusing me as follows. I use pytorch define a Variable x, compute and get its gradient x.grad, meaning grad is an attribute of instance x. But, we can use x.grad.data.zero_() to set x.grad to zero meaning that data.zero_() is method of x.grad. Why does an attribute has a method? Thanks a lot. – shuairenqin Jul 06 '17 at 07:58
  • `x` is just a python `class`, so it can have arbitrary attributes defined on it, which can include other classes, functions, and values, including functions that mutate and use values stored within the class itself. If you are wondering how the attribute `data` is actually defined on `ndarray` type objects, [this post](https://stackoverflow.com/questions/10004850/python-classes-and-oop-basics) on OOP in Python might help – Daniel Corin Jul 06 '17 at 15:58
  • it's really helpful – shuairenqin Jul 07 '17 at 09:06