1
>>> from io import StringIO
>>> import numpy as np
>>> s = StringIO("1,1.3,abcde")
>>> data = np.genfromtxt(s, dtype=[('myint','i8'),('myfloat','f8'),
... ('mystring','S5')], delimiter=",")
>>> data
array((1, 1.3, 'abcde'),
      dtype=[('myint', '<i8'), ('myfloat', '<f8'), ('mystring', '|S5')])

My question is related to dtype argument. I am unable to understand what dtype="i8,f8,|S5" stands for. I can make out that i is an integer,f is the float and s is the string but what is 8 in i8? I first understood it for bytes but how can then s5 be possible. I understand that dtype helps to specify the data type so that we can read from CSV file but can someone give some insight on data types

Akash Chandra
  • 375
  • 1
  • 4
  • 13
  • Read [the documentation](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.dtypes.html). – sco1 Dec 29 '17 at 18:53
  • "i8,f8,|S5" is a short hand for the full dtype that your `data` display shows. ` – hpaulj Dec 29 '17 at 19:11

1 Answers1

3

The 8 in i8 or f8 is the number of bytes. There are several different ways to express the same datatype in numpy. The strings you see from np.genfromtxt are in the compact format. The < or > sign in front mean little or big endian (see documentation), followed by i for integer or f for float/double, and the number of bytes.

The longer datatype names have the size in bits instead of bytes, meaning that i8 is int64, f4 is float32 and so on. E.g.:

>>> np.dtype('i8')
dtype('int64')
>>> np.dtype('f4')
dtype('float32')

By default these are all little endian. If you want big endian, as far as I know, np.dtype does not return the long form:

>>> np.dtype('>c16')
dtype('>c16') 

Strings are a special datatype, and the number means the maximum number of string characters. See this question for more details.

tiago
  • 22,602
  • 12
  • 72
  • 88