What you get from
array = np.genfromtxt("testQUADs", delimiter=8, dtype="|S8, i4, i4, i4, f8, f8, f8, |S8")
is a structured array
.
array.dtype
will look like
np.dtype("|S8, i4, i4, i4, f8, f8, f8, |S8")
array.shape
is the number of rows; it's a 1d array with 8 fields.
array[0]
is one element or record of this array; look at its dtype
. Don't worry about its type
(void is just the type of a compound dtype
record).
array['f0']
is the first field, all rows, in this case an array of strings.
You may need to read the dtype
and structured
array docs in more depth. Many SO posters have been confused about the 1d structured array that genfromtxt
produces.
genfromtxt
reads the file just like your code does, and splits each line into strings. Then it converts those strings according to the dtype
, and collects the results in a list. At the end it assembles that list into array
- this 1d array of the specified dtype. Since it is doing more than your code, it's not surprising that it is a bit slower.
loadtxt
does much the same, with less power in certain areas.
pandas
has a csv reader that is faster because it uses more compiled code. But a dataframe isn't any easier to understand than a structured array.
Your 2 methods don't produce the same thing:
In [105]: line = "QUAD4 1 123456 123456781.2345671.2345671.234567 "
In [106]: txt=[line,line,line] # a list of lines instead of a file
In [107]: A = np.genfromtxt(txt, delimiter=8, dtype="|S8, i4, i4, i4, f8, f8, f8, |S8")
In [108]: A
Out[108]:
array([ ('QUAD4 ', 1, 123456, 12345678, 1.234567, 1.234567, 1.234567, ' '),
('QUAD4 ', 1, 123456, 12345678, 1.234567, 1.234567, 1.234567, ' '),
('QUAD4 ', 1, 123456, 12345678, 1.234567, 1.234567, 1.234567, ' ')],
dtype=[('f0', 'S8'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<i4'), ('f4', '<f8'), ('f5', '<f8'), ('f6', '<f8'), ('f7', 'S8')])
Note the dtype
; and 3 elements
Your line parser:
In [109]: fn=txt[:]
In [110]: for i, line in enumerate(fn):
l = [line[0:8], line[8:16], line[16:24], line[24:32], line[32:40], line[40:48], line[48:56], line[56:64], line[64:72], line[72:80]]
fn[i] = [l[0].strip(), int(l[1]), int(l[2]), int(l[3]), float(l[4]), float(l[5]), float(l[6]), l[7].strip()]
.....:
In [111]: fn
Out[111]:
[['QUAD4', 1, 123456, 12345678, 1.234567, 1.234567, 1.234567, ''],
['QUAD4', 1, 123456, 12345678, 1.234567, 1.234567, 1.234567, ''],
['QUAD4', 1, 123456, 12345678, 1.234567, 1.234567, 1.234567, '']]
In [112]: A1=np.array(fn)
In [113]: A1
Out[113]:
array([['QUAD4', '1', '123456', '12345678', '1.234567', '1.234567',
'1.234567', ''],
['QUAD4', '1', '123456', '12345678', '1.234567', '1.234567',
'1.234567', ''],
['QUAD4', '1', '123456', '12345678', '1.234567', '1.234567',
'1.234567', '']],
dtype='|S8')
fn
is a list of lists, which can have the diverse types of values. But when you put it into an array, it turns everthing into a strings.
I could turn your fn
list into a structured array with:
In [120]: np.array([tuple(l) for l in fn],dtype=A.dtype)
Out[120]:
array([('QUAD4', 1, 123456, 12345678, 1.234567, 1.234567, 1.234567, ''),
('QUAD4', 1, 123456, 12345678, 1.234567, 1.234567, 1.234567, ''),
('QUAD4', 1, 123456, 12345678, 1.234567, 1.234567, 1.234567, '')],
dtype=[('f0', 'S8'), ('f1', '<i4'), ('f2', '<i4'), ('f3', '<i4'), ('f4', '<f8'), ('f5', '<f8'), ('f6', '<f8'), ('f7', 'S8')])
That's the same as A
from genfromtxt
except for the padding of the strings.
Here's a variation that might be useful, though it might also stretch your knowledge of structured array:
In [132]: dt=np.dtype('a8,(3)i,(3)f,a8')
In [133]: A = np.genfromtxt(txt, delimiter=8, dtype=dt)
A
now has 4 fields, two of which have multiple values
A['f1']
will return a (n,3) array of ints.