It's better to stick to regular NumPy arrays over the chararrays
:
Note:
The chararray class exists for backwards compatibility with
Numarray, it is not recommended for new development. Starting from
numpy 1.4, if one needs arrays of strings, it is recommended to use
arrays of dtype object_, string_ or unicode_, and use the free
functions in the numpy.char module for fast vectorized string
operations.
Going with the regular arrays, let's propose two approaches.
Approach #1
We could use np.count_nonzero
to count the True
ones after comparison against the search element : 'A'
-
np.count_nonzero(rr=='A')
Approach #2
With the chararray
holding single character elements only, we could optimize a lot better by viewing into it with uint8
dtype and then comparing and counting. The counting would be much faster, as we would be working with numeric data. The implementation would be -
np.count_nonzero(rr.view(np.uint8)==ord('A'))
On Python 2.x
, it would be -
np.count_nonzero(np.array(rr.view(np.uint8))==ord('A'))
Timings
Timings on original sample data and scaled to 10,000x
scaled ones -
# Original sample data
In [10]: rr
Out[10]: array(['B', 'B', 'B', 'A', 'B', 'A', 'A', 'A', 'B', 'A'], dtype='<U1')
# @Nils Werner's soln
In [14]: %timeit np.sum(rr == 'A')
100000 loops, best of 3: 3.86 µs per loop
# Approach #1 from this post
In [13]: %timeit np.count_nonzero(rr=='A')
1000000 loops, best of 3: 1.04 µs per loop
# Approach #2 from this post
In [40]: %timeit np.count_nonzero(rr.view(np.uint8)==ord('A'))
1000000 loops, best of 3: 1.86 µs per loop
# Original sample data scaled by 10,000x
In [16]: rr = np.repeat(rr,10000)
# @Nils Werner's soln
In [18]: %timeit np.sum(rr == 'A')
1000 loops, best of 3: 734 µs per loop
# Approach #1 from this post
In [17]: %timeit np.count_nonzero(rr=='A')
1000 loops, best of 3: 659 µs per loop
# Approach #2 from this post
In [24]: %timeit np.count_nonzero(rr.view(np.uint8)==ord('A'))
10000 loops, best of 3: 40.2 µs per loop