When you need to debug this kind of thing, it's useful to break it down into simpler steps. Are you getting the slices wrong, adding two incompatible array types, adding two types but trying to stick the results into an incompatible type (using +=
when +
is OK but =
is not), or adding incompatible data values? Any one of those could raise a TypeError
, so how do we know which one you're doing?
Well, just do them at a time and see:
Slicing:
>>> src[:, 1]
array(['b', 'd', 'f'], dtype='|S1')
>>> src[:, 1] = ['x', 'y', 'z']
>>> src
>>> array([['a', 'x'], ['c', 'y'], ['e', 'z']], dtype='|S1')
That's fine. What about adding?
>>> src + src2
TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'numpy.ndarray'
So, we've already found the same error as your more complicated case, without the slicing, and without the +=
, which makes things much easier to debug. Let's make it even simpler:
>>> s1, s2 = np.array('a'), np.array('b')
>>> s1 + s2
TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'numpy.ndarray'
So even adding two 0D arrays fails! Can't get any simpler than that.
Maybe it's the data types. What happens if we use integers?
>>> n1, n2 = np.array(1), np.array(2)
>>> n1 + n2
3
And you can go all the way back to your original example, using integers instead of strings, and it still works fine:
>>> m1 = np.array([[1,2], [3,4], [5,6]])
>>> m2 = np.array([[7], [8], [9]])
>>> m1[:, 1] += m2[:, 0]
>>> array([[ 1, 9],
[ 3, 12],
[ 5, 15]])
That should make it obvious that the problem is with data types. So, what is the data type? Just print out the array and see what numpy
thinks it is:
>>> src = numpy.array([["a", "b"], ["c", "d"], ["e", "f"]])
>>> src
array([['a', 'b'], ['c', 'd'], ['e', 'f']], dtype='|S1')
That '|S1'
isn't one of the friendly data types you see in the User Guide section on Data types, it's a structure definition, as explained in the section on Structured arrays. What it means is a 1-character fixed length string.
And that makes the problem obvious: You can't add two 1-character fixed-length strings, because the result isn't a 1-character fixed-length string.
If you really want to make this work as-is, the simple solution is to leave them as Python strings:
>>> src = numpy.array([["a", "b"], ["c", "d"], ["e", "f"]], dtype=object)
>>> src2 = numpy.array([["x"], ["y"], ["z"]], dtype=object)
>>> src[:, 1] += src2[:, 0]
No more TypeError
.
Alternatively, if you explicitly give src
a dtype of |S2
, numpy
will allow that, and the second character will just be blank. It won't let you add another |S1
into it, but you can loop in Python, or maybe find a complicated way to fix numpy
into doing it for you. Either way, you're not getting any of the usual time performance benefits of numpy
of course, but you are still getting the space performance benefits of using packed fixed-size cells.
But you might want to step back and ask what you're trying to get out of numpy
here. What is your actual higher-level goal here? Most of the benefit of numpy
comes from using strict C/Fortran-style data types that numpy
knows how to work with—it can pack them in tightly, access them without an extra dereference (and without refcounting), operate on in various ways from multiplying to copying to printing without any help from Python, etc. But it can't do string manipulation. If you're trying to vectorize string manipulation, you're using the wrong library to do it. If you're just using numpy
because someone said it's fast, well, that's true in many cases, but not in this one. If you're using numpy
because some other code is handing you numpy
data, but you don't want to treat it in a numpy
way, there's nothing stopping you from converting it to pure Python data.