9

I would like to convert a NumPy array of integers representing ASCII codes to the corresponding string. For example ASCII code 97 is equal to character "a". I tried:

from numpy import *
a=array([97, 98, 99])
c = a.astype('string')
print c

which gives:

['9' '9' '9']

but I would like to get the string "abc".

Håkon Hægland
  • 39,012
  • 21
  • 81
  • 174

6 Answers6

11

Another solution that does not involve leaving the NumPy world is to view the data as strings:

arr = np.array([97, 98, 99], dtype=np.uint8).view('S3').squeeze()

or if your numpy array is not 8-bit integers:

arr = np.array([97, 98, 99]).astype(np.uint8).view('S3').squeeze()

In these cases however you do have to append the right length to the data type (e.g. 'S3' for 3 character strings).

coderforlife
  • 1,378
  • 18
  • 31
10
print "".join([chr(item) for item in a])

output

abc
Ashoka Lella
  • 6,631
  • 1
  • 30
  • 39
  • Thanks Ashoka for the nice solution. I was too focused on trying to use a NumPy function, but this seems like an elegant solution. – Håkon Hægland Jul 19 '14 at 08:45
7

create an array of bytes and decode the the byte representation using the ascii codec:

np.array([98,97,99], dtype=np.int8).tostring().decode("ascii")

note that tostring is badly named, it actually returns bytes which happens to be a string in python2, in python3 you will get the bytes type back which need to be decoded.

jtaylor
  • 2,389
  • 19
  • 19
4
import numpy as np
np.array([97, 98, 99], dtype='b').tobytes().decode("ascii")

Output:

'abc'

Data type objects (dtype)

tostring() is deprecated since version 1.19.0. Use tobytes() instead.

ivanbgd
  • 171
  • 1
  • 5
1
from numpy import array

a = array([97, 98, 99])
print("{0:c}{1:c}{2:c}".format(a[0], a[1], a[2]))

Of course, join and a list comprehension can be used here as well.

Boris Verkhovskiy
  • 14,854
  • 11
  • 100
  • 103
nouseforname
  • 720
  • 1
  • 5
  • 17
1

Solutions that rely on Python loops or string formatting will be slow for large datasets. If you know that all of your data are ASCII, a faster approach could be to use fancy indexing:

import numpy as np
a = np.array([97, 98, 99])
np.array([chr(x) for x in range(127)])[a]
# array(['a', 'b', 'c'], dtype='<U1')

An advantage is that it works for arbitrarily shaped arrays.

nth
  • 1,442
  • 15
  • 12