genfromtxt acting strangely when datatypes are specified individually

Question

I'm currently loading in a file as below: 1400,,,2001,101,1000,1,07,08,332,8,2,,,,1,,9,,21,,36,,39,,53,,68,,95,,,,,0,8,,, 1400,,,2001,101,1000,2,07,08,222,11,1,,,,1,,1,,2,,12,,13,,21,,48,,112,,,,,0,11,,, 1400,,,2001,101,1001,1,07,08,24,0,0,,,,0,,1,,3,,7,,2,,3,,3,,5,,,,,0,0,,, 1400,,,2001,101,1001,2,07,08,14,0,0,,,,0,,0,,0,,3,,1,,4,,0,,6,,,,,0,0,,, 1400,,,2001,101,1002,1,07,08,0,0,0,,,,0,,0,,0,,0,,0,,0,,0,,0,,,,,0,0,,, 1402,,,2001,101,I25,1,07,08,0,0,0,,,,0,,0,,0,,0,,0,,0,,0,,0,,,,,0,0,,, 1401,,,2001,101,I26,2,07,08,0,0,0,,,,0,,0,,0,,0,,0,,0,,0,,0,,,,,0,0,,,

All of the columns should be ints, instead of the 6th column (values like 1000, I25) which I've set to be a string. I load the file in as follows:

data = np.genfromtxt(sys.argv[1], dtype=(int,int,int,int,int,"|S25",int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int,int), skip_header=1, delimiter=",")

The reason I have to do this is because otherwise it thinks everything is an int and sets the 6th column to -1.

I then set a mask so only lines set to 1400 are printed:

mask_country = (data[:,0] == 1400)

This, however, gives the error:

Traceback (most recent call last):
  File "Python/iw2.py", line 14, in <module>
    mask_country = (data[:,0] == 1400)
IndexError: too many indices

It's strange, because if I get rid of the dtype=() from the genfromtxt line, OR just specify all the variables as in with dtype=int it runs perfectly.

Why does specifying the data type for the columns individually result in this error?

If I don't set the mask I can print 'data' and it seems to be setting things correctly, as the last line is as follows:

(1401, -1, -1, 2001, 101, 'I26', 2, 7, 8, 0, 0, 0, -1, -1, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, 0, -1, -1, -1, -1, 0, 0, -1, -1, -1)]

score 0 · Accepted Answer · answered Dec 25 '13 at 00:49

0

When you specify the datatype like that, you create a 1D array, not a 2D array. Each element in your array is a record consisting of a series of other elements.

answered Dec 25 '13 at 00:49

mgilson

300,191
65
633
696

I see, thank you. How would I best go about specifying that the 6th column is a string, but all other columns ints, whilst retaining it as a 2 dimensional array? – James Dec 25 '13 at 00:53
The answer to this last comment in in your newer question about `genfromtxt`, http://stackoverflow.com/questions/20771694 – hpaulj Dec 26 '13 at 00:23

genfromtxt acting strangely when datatypes are specified individually

1 Answers1