3

I have a dictionary of file header values (time, number of frames, year, month, etc) that I would like to write into a numpy array. The code I have currently is as follows:

    arr=np.array([(k,)+v for k,v in fileheader.iteritems()],dtype=["a3,a,i4,i4,i4,i4,f8,i4,i4,i4,i4,i4,i4,a10,a26,a33,a235,i4,i4,i4,i4,i4,i4"])

But I get an error, "can only concatenate tuple (not "int") to tuple.

Basically, the end result needs to be arrays storing the overall file header info (which is 512 bytes) and each frame's data (header and data, 49408 bytes for each frame). Is there an easier way to do this?

Edit: To clarify (for myself as well), I need to write in the data from each frame of the file to an array. I was given matlab code as a base. Here's a rough idea of the code given to me:

data.frame=zeros([512 96])
frame=uint8(fread(fid,[data.numbeams,512]),'uint8'))
data.frame=frame

How do I translate the "frame" into python?

Victoria Price
  • 637
  • 3
  • 13
  • 26
  • Your error has nothing to do with numpy. It's coming from the `(k,)+v` in `[(k,)+v for k,v in fileheader.iteritems()]`. It sounds like you want to use the names of the keys as record names in the numpy array? If so, you need to build the dtype to use those names. Also keep in mind that `dict`s are unordered, which can cause problems with the way you have things written right now. – Joe Kington May 31 '12 at 18:18
  • Thanks! How can I put all of the values in an ordered format? (I have essentially no python experience) – Victoria Price May 31 '12 at 18:20

2 Answers2

4

You're probably better off just keeping the header data in dict. Do you really need it as an array? (If so, why? There are some advantages of having the header in a numpy array, but it's more complex than a simple dict, and isn't as flexible.)

One drawback to a dict is that there's no predictable order to its keys. If you need to write your header back to disk in a regular order (similar to a C struct), then you need to separately store the order of the fields, as well as their values. If that's the case, you might consider an ordered dict (collections.OrderedDict) or just putting together a simple class to hold your header data and storing the order there.

Unless there's a good reason to put it into an numpy array, you may not want to.

However, a structured array will preserve the order of your header and will make it easier to write a binary representation of it to disk, but it's inflexible in other ways.

If you did want to make the header an array, you'd do something like this:

import numpy as np

# Lists can be modified, but preserve order. That's important in this case.
names = ['Name1', 'Name2', 'Name3']
# It's "S3" instead of "a3" for a string field in numpy, by the way
formats = ['S3', 'i4', 'f8'] 

# It's often cleaner to specify the dtype this way instead of as a giant string
dtype = dict(names=names, formats=formats)

# This won't preserve the order we're specifying things in!!
# If we iterate through it, things may be in any order.
header = dict(Name1='abc', Name2=456, Name3=3.45)

# Therefore, we'll be sure to pass things in in order...
# Also, np.array will expect a tuple instead of a list for a structured array...
values = tuple(header[name] for name in names)
header_array = np.array(values, dtype=dtype)

# We can access field in the array like this...
print header_array['Name2']

# And dump it to disk (similar to a C struct) with
header_array.tofile('test.dat')

On the other hand, if you just want access to the values in the header, just keep it as a dict. It's simpler that way.


Based on what it sounds like you're doing, I'd do something like this. I'm using numpy arrays to read in the header, but the header values are actually being stored as class attributes (as well as the header array).

This looks more complicated than it actually is.

I'm just defining two new classes, one for the parent file and one for a frame. You could do the same thing with a bit less code, but this gives you a foundation for more complex things.

import numpy as np

class SonarFile(object):
    # These define the format of the file header
    header_fields = ('num_frames', 'name1', 'name2', 'name3')
    header_formats = ('i4', 'f4', 'S10', '>I4')

    def __init__(self, filename):
        self.infile = open(filename, 'r')
        dtype = dict(names=self.header_fields, formats=self.header_formats)

        # Read in the header as a numpy array (count=1 is important here!)
        self.header = np.fromfile(self.infile, dtype=dtype, count=1)

        # Store the position so we can "rewind" to the end of the header
        self.header_length = self.infile.tell()

        # You may or may not want to do this (If the field names can have
        # spaces, it's a bad idea). It will allow you to access things with
        # sonar_file.Name1 instead of sonar_file.header['Name1'], though.
        for field in self.header_fields:
            setattr(self, field, self.header[field])

    # __iter__ is a special function that defines what should happen when we  
    # try to iterate through an instance of this class.
    def __iter__(self):
        """Iterate through each frame in the dataset."""
        # Rewind to the end of the file header
        self.infile.seek(self.header_length)

        # Iterate through frames...
        for _ in range(self.num_frames):
            yield Frame(self.infile)

    def close(self):
        self.infile.close()

class Frame(object):
    header_fields = ('width', 'height', 'name')
    header_formats = ('i4', 'i4', 'S20')
    data_format = 'f4'

    def __init__(self, infile):
        dtype = dict(names=self.header_fields, formats=self.header_formats)
        self.header = np.fromfile(infile, dtype=dtype, count=1)

        # See discussion above...
        for field in self.header_fields:
            setattr(self, field, self.header[field])

        # I'm assuming that the size of the frame is in the frame header...
        ncols, nrows = self.width, self.height

        # Read the data in
        self.data = np.fromfile(infile, self.data_format, count=ncols * nrows)

        # And reshape it into a 2d array.
        # I'm assuming C-order, instead of Fortran order.
        # If it's fortran order, just do "data.reshape((ncols, nrows)).T"
        self.data = self.data.reshape((nrows, ncols))

You'd use it similar to this:

dataset = SonarFile('input.dat')

for frame in dataset:
    im = frame.data
    # Do something...
Joe Kington
  • 275,208
  • 71
  • 604
  • 463
  • Well, I suppose the header information doesn't need to be in an array. I do need the frame information in an array, however, to create an image. Bear with me here-- I was thrown into the deep end and tasked with translating matlab code for image processing of data. I know the following: The file header is 512 bytes, each frame is 49408 bytes in size with 256 of them being frame header, and the guy who wrote the matlab code set an initial array of zeros with the dimensions [512,96] (it's a sonar with 96 beams). I need to process each frame of each file. – Victoria Price May 31 '12 at 18:55
  • Negative, I'd like to eventually export the final data into a .txt file. For now, I need to read in each file and the associated image data; it's binary in format. Our end result (way down the line) is to take these image files (collected with a sonar camera) and autonomously locate a target in the sonar beam (for visualization, it's essentially a white circle on a black background). I need to ultimately store the coordinate location of the detected target in a file, if that makes any sense. Thank you so much for your help, I'm a total beginner! – Victoria Price May 31 '12 at 19:13
  • Oh, as an aside-- the program has already been written (semi)successfully in matlab, and we're trying to convert to python. – Victoria Price May 31 '12 at 19:13
  • See the updates. Hope it helps a bit! There's more than one way to do it, and you could do what I've shown with considerably less code, but it would be less flexible if you need to update it in the future. I usually find its easiest do do something along these lines for reading in similar data. It's quite simple to add writing to disk in the same format to this, as well. (For whatever it's worth, I'm a marine geophysicist as well, and I seem to wind up reading in random binary data formats a lot more often than I'd like.) – Joe Kington May 31 '12 at 19:32
  • So far, success! No errors or anything. Should I ditch the original dictionary I'd written for the headers, then? How can I display the array created for say, the file and initial frame? – Victoria Price May 31 '12 at 19:53
  • By display, do you mean "make a psuedo-color plot" or just print out the numerical values? If you want to plot the data, use matplotlib's `imshow`. To print it, you can just call `print data` to display a summary. – Joe Kington May 31 '12 at 20:23
  • One more question-- the actual image data contained under each frame starts after 256 bytes of frame header information-- how do I ensure that the data getting read into the array is everything from 256 bytes on (i.e. just the image data)? – Victoria Price Jun 04 '12 at 14:33
1

The problem seems to be that v is an int rather than a tuple. Try:

arr=np.array([(k,v) for k,v in fileheader.iteritems()],dtype=["a3,a,i4,i4,i4,i4,f8,i4,i4,i4,i4,i4,i4,a10,a26,a33,a235,i4,i4,i4,i4,i4,i4"])
Matt
  • 21,026
  • 18
  • 63
  • 115
  • `k` is going to be a string, here, which doesn't make any sense given the dtype of the array. The OP probably just wants `v`, but I'm not sure.... – Joe Kington May 31 '12 at 18:20
  • I got each dtype from that variable in the fileheader... a3 is a 3-string file type, a is a version, i4 is an int32 number of frames, and so on. Some of the headers in the file correspond to strings, some are floats, and most are 32-bit integers. Does that help? – Victoria Price May 31 '12 at 18:26