0

I'm trying to read a numpy array from a big file of binary data. Each binary record contains 7330 floats, followed by a long I want to ignore, and then an int. I create a dtype as follows:

dt = [(str(n),'f4') for n in range(7330)]
dt += [('junk','i8'), ('label','i4')]

and then read the file via

d = np.fromfile(file_name,dtype=np.dtype(dt))

It works, but I get back a one-dimensional array or records instead of the 2-D array I want. Somewhat more specifically, I get back an array with d.shape=(58134,) d[0] of type numpy.void and len(d[0])=7332 (7330 floats, the long I will ignore, and the int). I want an array of shape (58134,7332).

I can't d.reshape(-1,7332) because d is one dimensional, and I wind up converting it via the ugly and somewhat absurd

nparray = pd.DataFrame.from_records(d).to_numpy()

which seems just ridiculous. What am I doing wrong? Thanks!

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
  • Please provide a sample of the 1D array you get and what it should look like instead so that we cn better understand the problem – G. Anderson Sep 03 '20 at 21:08
  • what is the difference between a 1-D structured array `(n, )` and a two-dimensional `(n, 1)` array of records except for the redundant axis? Seems like the sane thing to do, if you really want `(n, 1)` then just reshape it. – juanpa.arrivillaga Sep 03 '20 at 21:11
  • 1
    `dtype` is the "type" of a single "element". Each "element" in your array has the information of 7330 floats, a long and an int, but it is still just an "element" of the resulting structured array. – darcamo Sep 03 '20 at 21:24
  • @darcamo: That's exactly right, of course. I guess my question should have been "how do I make this a 2-D array" instead of "why is this a 1-D array". :) – Matt Ginsberg Sep 03 '20 at 21:26
  • Maybe something like this question https://stackoverflow.com/questions/5957380/convert-structured-array-to-regular-numpy-array – darcamo Sep 03 '20 at 21:31
  • `dt = [('data', 'f4', 7330), ('junk','i8'), ('label','i4')]` might also be useful. It will create 3 fields. `arr['data']` should then be the desired 2d array of floats. – hpaulj Sep 03 '20 at 23:47

1 Answers1

0

Turns out that numpy.lib.recfunctions.structured_to_unstructured does exactly this. Thanks to darcamo for pointing me in that direction.