1

Now that I am quite familiar with Python, decided to learn C++, so I am very n00b but sure willing to learn. I had made a script to read from a very tightly-specified file format (.EDF, for medical signals), with an ascii header defined by field sizes in bytes. So, I read 8 bytes for the first field, 80 bytes for the second field, and so on.

My working python script is as follows:

## HEADER FIELD NAMES AND SIZES FROM EDF SPEC:
header_fields = (
('version',     8),    ('patinfo',    80),    ('recinfo',      80),
('start date',  8),    ('start time',  8),    ('header bytes',  8),
('reserved',   44),   ('nrecs',        8),    ('recduration',   8),
('nchannels', 4))

## TELL WHICH FILE TO OPEN
folder = os.path.expanduser('~/Dropbox/01MIOTEC/06APNÉIA/Samples')
f = open(folder + '/Osas2002plusQRS.rec', 'rb')

# READ FILE CONTENT TO DICTIONARY OF LABELLED FIELD CONTENTS,
# ALREADY STRIPPED FROM BLANK SPACES
header = {}
for key, value in header_fields:
    header[key] = f.read(value).strip()

The end result is 'header', a dictionary where each pair is a "labeled" string.

My current awkward c++ code, which almost work printing to screen the unstripped strings, is this:

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

static int header_bytes[] = {8,80,80,80,80,8,8,8,44,8,8,4};
static int header_bytes_len = sizeof(header_bytes)/sizeof(int);
static string header_fields[] =
{
    "version",     
    "patinfo",     
    "recinfo",     
    "patinfo",     
    "recifo",      
    "start date",  
    "start time",  
    "header bytes",
    "reserved",    
    "nrecs",       
    "rec duration",
    "nchannels"
};

int main()
{
    ifstream edfreader;
    edfreader.open("/home/helton/Dropbox/01MIOTEC/06APNÉIA/Samples/Osas2002plusQRS.rec", ios::binary);

    char * buffer = new char [80];
    for (int n = 0; n<header_bytes_len; n++)
    {
        edfreader.read(buffer, header_bytes[n]);
        buffer[header_bytes[n]] = '\0';
        cout<<"'"<<buffer<<"'"<<endl;
    }
    return 0;
}

Actually, I copy-pasted the last part of main() from a cplusplus.com forum entry, just to get some kind of output, but actually what I wanted was to save the fields as an array of string objects, or better yet an array of pointers to string objects. I am reading "C++ Primer", but still in 200+ pages, but I want badly to fiddle with some c++ code fiddling, so if anyone could point me to some methods or concepts or readings, I would be very happy.

Thanks for reading

heltonbiker
  • 26,657
  • 28
  • 137
  • 252
  • Is this all binary data? Then you don't want to read it into `std::string` objects, that's not what they're for. Instead, define a `class` for your signals. – Fred Foo Apr 13 '11 at 14:20
  • Actually only the header is pure ASCII, but following the header there are the actual signal stream composed of 16bit integers (short), and that's why I opened it as binary. But the header is ASCII for sure. – heltonbiker Apr 13 '11 at 14:31
  • Do the ASCII strings always completely fill the field? Or are they nul-terminated? Is every field always nul-terminated (i.e., is an 8-byte field seven characters + a nul)? – Fred Foo Apr 13 '11 at 14:34
  • @larsmans: the .EDF specifications makes mandatory for the rest of the field be filled by spaces (null terminator or anything else is forbidden), and the sample files I opened in python are this way, so far. – heltonbiker Apr 13 '11 at 15:06

3 Answers3

2
  • Open the file in binary mode or you may have problems.
  • There is a problem in the way you output the result of reading: it assumes '\0' terminated strings which you aren't sure to get (or perhaps are sure not to get if your fields are padded with spaces). Enlarge the buffer and add a '\0' after reading:

    buffer[header_bytes[n]] = '\0';
    
AProgrammer
  • 51,233
  • 8
  • 91
  • 143
  • Thanks, @AProgrammer, I edited the answer to reflect your fine suggestions. Now my output is much better, too, now that each output is being terminated correctly by the `'\0'`. – heltonbiker Apr 13 '11 at 14:40
1

create a class/struct that describes the file format, similar to what you did in python

   struct Header {
    char version[8];
    char patinfo[80];
    ..., 
    };

then open file in binary mode and read the records using the above struct

ifstream file( "filename", ios::binary );
Header H;
file.read( reinterpret_cast<char*>(&H), sizeof(H) );

this reads the header record now you can access the contents of the structbut you need to be careful not treat the members as strings since they may or may not have ending \0

you can do it fancier than above, but this is just a quick change to your existing code instead of creating a more elaborate class / file handling

AndersK
  • 35,813
  • 6
  • 60
  • 86
  • Thanks for your suggestions, I thought of creating a more structured way of doing things. I will read your idea with care soon, 'cause right now I'm too n00b to understand the implications of it. – heltonbiker Apr 13 '11 at 14:44
  • 1
    Note that there will probably be no padding between the fields of the struct but there is no guarantee of that. – AProgrammer Apr 13 '11 at 14:44
  • -1 for teaching a new C++ user a low-level, unreliable, unportable hack. – Fred Foo Apr 13 '11 at 17:10
1

Assuming there are no spaces in the fields except for the padding, you can read them into C++ strings using:

/* Read field of n bytes */
std::string read_field(std::istream &edfreader, size_t n)
{
    // there's no need for new;
    // in fact, new may lead to a memory leak if you forget to delete
    std::vector<char> buf(n);

    // read as a sequence of bytes
    edfreader.read(&buf.front(), n);

    // find the first space or end of buffer
    size_t end = 0;
    while (end < n && buf[end] != ' ')
        end++;

    // make a string object from the buffer
    return std::string(buf, end);
}

std::string does memory allocation for you; you can use it pretty much like a Python string, except that it is modifiable.

The only assumptions made here are that your OS's character set is (a superset of) ASCII and that exception handling is on for edfreader.

Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • I chose your answer for now, because it does what I wanted: store the content of each read() as a (auto memory-managed) std::string object. Thanks for your interest, now it's homework! – heltonbiker Apr 13 '11 at 18:06
  • What is that function supposed to do? buf[80] followed by a read(buf, 8) ... ? Then there is the assumption that n is <= 8, otherwise the while loop may read uninitialized data. And if read(buf, 8) is ment to be read(buf, n), there is the assumption that n <= 80, since otherwise you would overflow the buffer. std::vector would be just fine as a buffer. Yes, it's slower, but I don't think that it would be an issue here. So... a little more than "just one assumption". And bad ones too. – Paul Groke Apr 14 '11 at 21:18