21

I have two arrays of strings:

In [51]: r['Z']
Out[51]: 
array(['0', '0', '0', ..., '0', '0', '0'], 
      dtype='|S1')

In [52]: r['Y']                                                                                                                
Out[52]: 
array(['X0', 'X0', 'X0', ..., 'X0', 'X1', 'X1'], 
      dtype='|S2')

What is the difference between S1 and S2? Is it just that they hold entries of different length?

What if my arrays have strings of different lengths?

Where can I find a list of all possible dtypes and what they mean?

Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564

2 Answers2

32

See the dtypes documentation.

The |S1 and |S2 strings are data type descriptors; the first means the array holds strings of length 1, the second of length 2. The | pipe symbol is the byteorder flag; in this case there is no byte order flag needed, so it's set to |, meaning not applicable.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • I thought this rang a bell - http://stackoverflow.com/questions/13997087/what-are-the-available-datatypes-for-dtype-with-numpys-loadtxt-an-genfromtxt – Jon Clements Feb 09 '13 at 16:38
  • Thanks! What would happen if my arrays had strings of different lengths? Is there a special S type for that? – Amelio Vazquez-Reina Feb 09 '13 at 17:09
  • 2
    @user273158: Arrays can *only* contain fixed length items; variable-length strings are not supported, not as `S` anyway. You can store object references (`dtype('O')`) where the objects *can be* Python strings, though. – Martijn Pieters Feb 09 '13 at 19:11
4

For storing strings of variable length in a numpy array you could store them as python objects. For example:

In [456]: x=np.array(('abagd','ds','asdfasdf'),dtype=np.object_)

In [457]: x[0]
Out[457]: 'abagd'

In [459]: map(len,x)
Out[459]: [5, 2, 8]

In [460]: x[1]=='ds'
Out[460]: True

In [461]: x
Out[461]: array([abagd, ds, asdfasdf], dtype=object)

In [462]: str(x)
Out[462]: '[abagd ds asdfasdf]'

In [463]: x.tolist()
Out[463]: ['abagd', 'ds', 'asdfasdf']

In [464]: map(type,x)
Out[464]: [str, str, str]
MrCartoonology
  • 1,997
  • 4
  • 22
  • 38