1

I am in larval stage with Python and pre-egg stage in C++, but i am trying to do my best, specially with the "Don't Repeat Yourself" principle.

I have a multichannel raw file-format to open, with a main ascii header with fields representable as strings and integers (always coded as chars padded with white spaces). The second part is N headers, with N being a field of the main header, and each of those headers has itself a lot more of text and number fields (coded as ascii) refering to the length and size of the actual 16 bit multichannel streams that compose the rest of the file.

So far, I have this working code in C++:

#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <map>

using namespace std;

struct Header {
    string version;
    string patinfo;
    string recinfo;
    string start_date;
    string start_time;
    int header_bytes;
    string reserved;
    int nrecs;
    double rec_duration;
    int nchannels;
};

struct Channel {
    string label;
    string transducertype;
    string phys_dim;
    int pmin;
    int pmax;
    int dmin;
    int dmax;
    string prefiltering;
    int n_samples;
    string reserved;
};


int main()
{
    ifstream edf("/home/helton/Dropbox/01MIOTEC/06APNÉIA/Samples/Osas2002plusQRS.rec", ios::binary);

    // prepare to read file header
    Header header;
    char buffer[80];

    // reads header fields into the struct 'header'
    edf.read(buffer, 8);
    header.version = string(buffer, 8);

    edf.read(buffer, 80);
    header.patinfo = string(buffer, 80);

    edf.read(buffer, 80);
    header.recinfo = string(buffer, 80);

    edf.read(buffer, 8);
    header.start_date = string(buffer, 8);

    edf.read(buffer, 8);
    header.start_time = string(buffer, 8);

    edf.read(buffer, 8);
    stringstream(buffer) >> header.header_bytes;

    edf.read(buffer, 44);
    header.reserved = string(buffer, 44);

    edf.read(buffer, 8);
    stringstream(buffer) >> header.nrecs;

    edf.read(buffer,8);
    stringstream(buffer) >> header.rec_duration;

    edf.read(buffer,4);
    stringstream(buffer) >> header.nchannels;

    /*
    cout << "'" << header.version << "'" << endl;
    cout << "'" << header.patinfo << "'" << endl;
    cout << "'" << header.recinfo << "'" << endl;
    cout << "'" << header.start_date << "'" << endl;
    cout << "'" << header.start_time << "'" << endl;
    cout << "'" << header.header_bytes << "'" << endl;
    cout << "'" << header.reserved << "'" << endl;
    cout << "'" << header.nrecs << "'" << endl;
    cout << "'" << header.rec_duration << "'" << endl;
    cout << "'" << header.nchannels << "'" << endl;
    */

    // prepare to read channel headers
    int ns = header.nchannels; // ns tells how much channels I have
    char title[16]; // 16 is the specified length of the "label" field of each channel

    for (int n = 0; n < ns; n++)
    {
        edf >> title;
        cout << title << endl; // and this successfully echoes the label of each channel
    }


    return 0;
};

Some remarks I already have to make:

  • I opted to use struct because the format specification is very hardcoded;
  • I didn't iterate over the main header fields because the number of bytes and types to read seemed to me rather arbitrary;
  • Now that I successfully got each channel's label, I would actually create structs for each channel's fields, which by themselves would have to be stored perhaps in a map.

My (hopefully straightforward) question is:

"Should I worry about cutting corners to make this kind of code more 'Pythonic' (more abstract, less repetitive), or this is not the way things work in C++?"

Many Python evangelists (as I would be myself, because I love it) highlight its easyness to use and all that. So, I will wonder for some time if I am doing dumb things or only doing things right, but not so "automagical" because of the very nature of C++.

Thanks for reading

Helton

heltonbiker
  • 26,657
  • 28
  • 137
  • 252
  • 10
    I think that in almost all cases, DRY [Don't Repeat Yourself](http://en.wikipedia.org/wiki/Don%27t_repeat_yourself) code is a worthy goal. Non-DRY code often leaves me going "WTF? Why make things so hard?!?" However, C++ code is C++ code and not Python code. Trying to achieve DRY-ness should still be done within scope of the language and conventional/supported constructs. Abstraction is also nice, but the scope/requirement of the work should also be considered (don't make a mountain from a molehill). –  Apr 14 '11 at 20:17
  • @pst: Very nice and insightful thought, thanks. – heltonbiker Apr 14 '11 at 20:49

5 Answers5

5

I'd say there's no such thing as Pythonic C++ code. The DRY principle applies in both languages, but much of what is considered "Pythonic" is simply the shortest, sweetest way of expressing logic in Python, using Python-specific constructs. Idiomatic C++ is quite different.

lambda, for example, is sometimes not considered very Pythonic and reserved for cases where no other solution exists, but is just being added to the C++ standard. C++ has no keyword arguments, which are very Pythonic. C++ programmers don't like constructing a map when not necessary, while a Python programmer might throw dict at a lot of problems where they just happen to make the intention clearer than the efficient alternative.

If you want to save typing, use the function I posted earlier, then:

header.version = read_field(edf, 8);
header.patinfo = read_field(edf, 80);

That should save you quite a few lines. But more important than those few lines is that you've achieved a small amount of modularity: how to read a field and what fields to read are now separate parts of your program.

Community
  • 1
  • 1
Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • DRY isn't language specific, and it is universal. Just a thought. – eckza Apr 14 '11 at 20:28
  • Not really universal. Doesn't really fit in a strongly typed language that doesn't support generics. – Mike Ramirez Apr 14 '11 at 20:35
  • 1
    @Mike: then you could deploy a preprocessor in order to remain DRY. Preprocessing can be icky, but then so are strongly typed languages without generics. – Steve Jessop Apr 14 '11 at 21:41
  • @Steve Jessop FTR, I was referring to Go lang, where fast compile times are important and a feature. And IMO DRY isn't that important for preprocessing. – Mike Ramirez Apr 14 '11 at 22:04
  • @Mike: Go has its duck-typed interfaces, though, which supports some of the repetition-avoidance tactics that you'd do with generics in C++. Just not all. If I were creating some fantastically powerful container for Go, and didn't want to hack the language itself, then I'd certainly do it with a preprocessor in preference to copy and paste for each type. I very much don't want to have to update all those copies when I change an implementation detail. But I haven't used Go very much, and I don't know what the dedicated fans are up to. – Steve Jessop Apr 14 '11 at 22:33
  • @Steve Jessop That's cool, reflection is used a lot for this, which is fugly, but helps cut down the repetitiousness, but at the end of the day. We choose to do it differently. I don't think DRY is that important to go through preprocessing. – Mike Ramirez Apr 15 '11 at 02:20
  • Readability and maintainability are important too, though, and heavy preprocessing can badly hurt both of those. – ncoghlan Apr 15 '11 at 02:33
  • @larsmans, I am very thankful for your support, and I have to apologize for leaving the impression that I did not wanted to follow your class vs. struct answer. Actually, I am at the very beginning of C++ learning, and have never declared a function or a class in it yet. Also, I am still looking for the implications of these choices, and since class can have member functions (methods), it is obviously the right choice in this case. Anyway, now I have already declared my first stuct, too. Thanks again. – heltonbiker Apr 15 '11 at 13:29
2

You are correct: as written, the code is repetitive (and has no error checking). Each field that you read really requires you to take three or five steps, depending on the type of data being read:

  1. Read the field from the stream
  2. Ensure the read succeeded
  3. Parse the data (if necessary)
  4. Ensure the parse succeeded (if necessary)
  5. Copy the data into the target location

You can wrap all three of these up into a function so that the code is less repetitive. For example, consider the following function templates:

template <typename TStream, typename TResult>
void ReadFixedWidthFieldFromStream(TStream& str, TResult& result, unsigned sz) 
{
    std::vector<char> data(sz);

    if (!str.read(&data[0], sz))
        throw std::runtime_error("Failed to read from stream");

    std::stringstream ss(&data[0]);
    if (!(ss >> result))
        throw std::runtime_error("Failed to parse data from stream");
}

// Overload for std::string:
template <typename TStream>
void ReadFixedWidthFieldFromStream(TStream& str, std::string& result, unsigned sz) 
{
    std::vector<char> data(sz);

    if (!str.read(&data[0], sz))
        throw std::runtime_error("Failed to read from stream");

    result = std::string(&data[0], sz);
}

Now your code can be much more succinct:

ReadFixedWidthFieldFromStream(edf, header.version, 8);
ReadFixedWidthFieldFromStream(edf, header.patinfo, 80);
ReadFixedWidthFieldFromStream(edf, header.recinfo, 80);
// etc.
James McNellis
  • 348,265
  • 75
  • 913
  • 977
  • `typename TStream`? What's wrong with good old `std::istream`? – Fred Foo Apr 14 '11 at 20:27
  • 1
    There's nothing wrong with `std::istream`; it was already a function template so I made the stream parameter generic as well. In production code I'd probably also use a custom exception class, but for demonstration purposes `std::runtime_error` suffices. – James McNellis Apr 14 '11 at 20:32
  • Interestingly enough, templating on the stream type reminds me of Python's file-like objects. – Fred Foo Apr 14 '11 at 20:45
  • Depending on the type of data, I'd probably also consider using `basic_istream` and using `CharT` for the buffer. I'm going to stop thinking about this now, though, because I'll keep coming up with more improvements. – James McNellis Apr 14 '11 at 20:48
  • these are interesting approaches. I will have to learn a lot before being able to understand the whole template thing, but when I do I will come back to see this answer. Thanks. – heltonbiker Apr 15 '11 at 13:33
1

This code is straightforward, simple, and easy to understand. If it's working, don't waste time changing it. I'm sure there's plenty of badly written, complex, and difficult to understand (and probably incorrect) code that should be fixed first :)

Larry Watanabe
  • 10,126
  • 9
  • 43
  • 46
0

For reading from file directly in strings see this question The rest is wrong. but personally I think there's a better/cleaner way of doing this.

If you know the size of the structure don't use string, use primitive C types (and make sure the structure is packed). See these links: http://msdn.microsoft.com/en-us/library/2e70t5y1(v=vs.80).aspx & http://gcc.gnu.org/onlinedocs/gcc-3.2.3/gcc/Type-Attributes.html

I would do it this way for example (not sure about the size of each string but you get the idea):

struct Header {
    char version[8];
    char patinfo[80];
    char recinfo[80];
    char start_date[8];
    char start_time[8];
    int header_bytes;
    char reserved[44];
    int nrecs;
    double rec_duration;
    int nchannels;
};

Once you have a packed structure you can read it directly from the file:

struct Header h;
edf.read(&h,sizeof(struct Header));

For me this is the cleanest way to do it, but remember you must have your structure packed so that you have the guarantee that the structure in memory has the same size as the structure saved in the file - this is not very hard to see while testing.

Community
  • 1
  • 1
INS
  • 10,594
  • 7
  • 58
  • 89
  • This is certainly not the cleanest way as **it cannot be done portably**. There's no such a thing as a packed structure in standard C++. – Fred Foo Apr 14 '11 at 20:35
  • All decent compilers support this feature which unfortunately is not mentioned in the standard. I don't remember any real compiler not supporting this, and I've used some compilers. – INS Apr 14 '11 at 20:38
  • 2
    This fails to account for the fact that the numeric fields are stored as plaintext and need to be parsed. This solution also leaves you with a set of fixed-width strings which are not as convenient to deal with in C++ as null-terminated strings or `std::string`. – James McNellis Apr 14 '11 at 20:39
  • different compilers have different syntaxes for packing, so it's still not portable. Worse, the MSVC `#pragma pack` may be silently ignored by other compilers, leading to subtle bugs when porting. – Fred Foo Apr 14 '11 at 20:41
0

The Zen of Python doesn't mention DRY explicitly.

>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
  • Sometimes, I'm just glad to be Dutch :) – Fred Foo Apr 14 '11 at 20:44
  • Forgive me if I'm wrong, but 'Zen of Python' is a subset of 'All possible ways of being pythonic'. DRY seems to me very pythonic. But I appreciate your contribution: one cannot underemphasize the importance of Zen, even in C++. – heltonbiker Apr 15 '11 at 14:12
  • @heltonbiker, DRY is a good principle in any language, but sometimes it conflicts with Beautiful, Explicit, Simple, Flat and Readability. Readability is especially important in my opinion, and don't forget practicality beats purity – John La Rooy Apr 15 '11 at 14:19