I have a long unicode string:
alphabet = range(0x0FFF)
mystr = ''.join(chr(random.choice(alphabet)) for _ in range(100))
mystr = re.sub('\W', '', mystr)
I would like to view it as a series of code points, so at the moment, I am doing the following:
arr = np.array(list(mystr), dtype='U1')
I would like to be able to manipulate the string as numbers, and eventually get some different code points back. Now I'd like to invert the transformation:
mystr = ''.join(arr.tolist())
These transformations are reasonably fast and invertible, but take up an unnecessary amount of space with the list
intermediary.
Is there a way to convert a numpy array of unicode characters to and from a Python string without converting to a list first?
Afterthoughts
I can get arr
to appear as a single string with something like
buf = arr.view(dtype='U' + str(arr.size))
This results in a 1-element array containing the entire original. The inverse is possible as well:
buf.view(dtype='U1')
The only issue is that the type of the result is np.str_
, not str
.