Question
Please help understand the cause of the problem in the code below, and suggest related articles to look into.
Background
In my understanding, a numpy structured type with multiple fields which includes sub-array is defined as:
the_type = np.dtype(
[ # ndarray
(<name>, <numpy dtype>, <numpy shape>) # (name, dtype, shape)
]
)
np.shape([[1, 2]]) # 2D matrix shape (1, 2) with 1 row x 2 columns
np.shape([1]) # 1D array shape (1, )
np.shape(1) # 0D array shape () which is not a scalar
subarray data type A structured data type may contain a ndarray with its own dtype and shape:
dt = np.dtype([('a', np.int32), ('b', np.float32, (3,))])
np.zeros(3, dtype=dt)
---
array([(0, [0., 0., 0.]), (0, [0., 0., 0.]), (0, [0., 0., 0.])],
dtype=[('a', '<i4'), ('b', '<f4', (3,))])
Problem
The first code works with a warning, which I believe complaining 1
in ("b", np.ubyte, 1)
is not a proper numpy shape and it should be in the 1D array shape (1,). This is not an issue.
color_type = np.dtype([
("r", np.ubyte, (1,)),
("g", np.ubyte, (1)), # <--- warning
("b", np.ubyte, 1) # <--- warning
])
---
FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
However, the second code does not work and would like to understand why.
- According to the warning in the code above, I believe
16
and(16)
are both(16,)
. Is it correct or depends on the dtype? - I think a Unicode string is aarray in Python as
"hoge"[3] -> 'e'
, then why (16,) is an error?
dt = np.dtype(
[
('first', np.unicode_, 16), # OK and no warning
('middle', np.unicode_, (16)), # OK and no warning
('last', np.unicode_, (16,)), # <----- Error
('grades', np.float64, (2,)) # OK and no warning
]
)
x = np.array(
[
('Sarah', 'Jeanette', 'Conner', (8.0, 7.0)),
('John', '', 'Conner', (6.0, 7.0))
],
dtype=dt
)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-382-3e8049d5246c> in <module>
----> 1 dt = np.dtype(
2 [
3 ('first', np.unicode_, 16),
4 ('middle', np.unicode_, (16)),
5 ('last', np.unicode_, (16,)),
ValueError: invalid itemsize in generic type tuple
Update
Understood that I misunderstood the dtype. In this case, a shape is not required but the length.