3

I just want to parse some data that can be present in one of two types. First one is

struct typeA {
  int id,
  char name[16],
  int value1,
  int value2,
  int value3,
  int value4
} __attribute__ ((packed));

second possibility is that the data has a form with double name length

struct typeB {
  int id,
  char name[32],
  int value1,
  int value2,
  int value3,
  int value4
} __attribute__ ((packed));

so far so good. Now I have two functions that parse these two

int parse_typeA(struct typeA *x){ /* do some stuff */ }
int parse_typeB(struct typeB *x){ /* do some stuff */ }

Now this is obviously impractical if you have more types. How can I realize the parsing of both types using one single function and an additional parameter like

int parse_any_type(void *x, int type){ 
    /*
     *  WHAT TO DO HERE ??
     *  
     *  The following doesn't work
     *  
     *  if(type == 1)
     *    struct typeA *a = (struct typeA *)x;
     *  else
     *    struct typeB *a = (struct typeB *)x;
     */
    printf("%i\n", a->id);
    printf("%s\n", a->name);
    printf("%i\n", a->value1);
    printf("%i\n", a->value2);
    printf("%i\n", a->value3);
    printf("%i\n", a->value4);
}

Anyone any idea?

Peter F.
  • 111
  • 1
  • 7
  • 1
    You'll find a number of questions about making c do object oriented programming on the site. You're starting down that road with this, though you can't do *quite* what you're asking for. That said, it is usually more trouble than it is worth. For a real project why not just use c++? After all, polymorphism is one of the things it *adds*. – dmckee --- ex-moderator kitten Nov 24 '11 at 20:59
  • @dmckee - How do you know this isn't a 'real project'? – James Nov 24 '11 at 21:04
  • I guess you're right. Seems like this is just almost impossible using plain C. I just didn't want to give up :-) – Peter F. Nov 24 '11 at 21:08
  • @James: I don't and I make no judgement. I condition the suggestion because Peter may be working this issue for his own edification in which case the suggestion is pointless. – dmckee --- ex-moderator kitten Nov 24 '11 at 21:10
  • 1
    The function name says you want to *parse* data, but your example function says you want to do the opposite (i.e. *serialise*) -- This is a big difference for possible solutions. Which one do you actually need? – bitmask Nov 24 '11 at 21:29
  • You parse code looks awfully like printing code to me. – Jonathan Leffler Nov 24 '11 at 21:30
  • If the structures have compatible member names then you could macroise the parse function so that you could generate it multiple times one for each structure. – Neil Nov 26 '11 at 22:45

6 Answers6

3

It depends on how general your solution must be. As other answers have identified, the two example structures are extremely similar, and therefore can be managed relatively easily (though deciding how to determine the end of the character string presents some problems).

If you need a more general system, you'll probably need to look at some sort of 'structure descriptor' string, which you pass to the converter, or possibly a 'structure descriptor array'.

For example, the strings might be:

"i s16 i i i i"  // typeA
"i s32 i i i i"  // typeB

"u32 i64 z d d"  // struct { uint32_t a; int64_t b; size_t c; double d; double e; };

int parse_any_type(void *output, const char *desc);

You then have to deal with some alignment and padding issues, but (as long as you get the descriptor strings correct) you can write a routine to handle that lot (packed or unpacked).

Using 'descriptors', you'd probably be dealing with one of the less well known macros in C, the offsetof macro defined in <stddef.h>. You'd create a descriptor type such as:

enum Type { CHAR, UCHAR, SCHAR, STR, USTR, SSTR, SHORT, USHORT, INT, UINT, LONG, ULONG, ... };
struct descriptor
{
    enum Type  m_type;    // Code for the variable type
    size_t     m_size;    // Size of type
    size_t     m_offset;  // Offset of variable in structure
};

struct descriptor d_TypeA[] =
{
    { INT, sizeof(int), offsetof(TypeA, id)     },
    { STR,          16, offsetof(TypeA, name)   },
    { INT, sizeof(int), offsetof(TypeA, value1) },
    { INT, sizeof(int), offsetof(TypeA, value2) },
    { INT, sizeof(int), offsetof(TypeA, value3) },
    { INT, sizeof(int), offsetof(TypeA, value4) },
};

You can then pass the appropriate type descriptor array (and the size of that array) to the function, along with the pointer to where the data is to be stored.

Instead of using an enumeration, you might use a function pointer type which points to the correct converter.

int parse_structure(void *output, const struct descriptor *desc, size_t n_desc);

Another alternative is that you simply deal with each type with an appropriate function which calls other simpler functions to handle each piece of the structure.

int parse_TypeA(TypeA *output)
{
    if (parse_int(&output->id)      == 0 &&
        parse_str(output->name, 16) == 0 &&
        parse_int(&output->value1)  == 0 &&
        parse_int(&output->value2)  == 0 &&
        parse_int(&output->value3)  == 0 &&
        parse_int(&output->value4)  == 0)
        return 0;
    ...diagnose error...
    return -1;
}

Your examples have not clearly identified where the data comes from, as opposed to where it is to be stored. This may not matter, but will affect the solution. Given no arguments, it might be reasonable to expect the data to be read from standard input. Alternatively, you might have a string containing the data to be parsed, possibly with a length, too; these would be arguments to the function.

Your examples have not illustrated error handling; how will the calling code know whether the conversion was successful or not.

If done correctly, the same description mechanism can be used for both the parsing and the printing mechanisms - your parse_any_type() function looks more like a printing function.


See Also

Community
  • 1
  • 1
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • That's it dude :-) Thanks. I could use the offsetof Operator to realign the structure by calculating the pointer offset. :-) I'll give this a try. Never heard/read of that operator before... – Peter F. Nov 24 '11 at 21:33
  • Another variation on the above that occurred to me was to pass an array of pointers to functions that would retrieve the appropriate member from the structure. Sort of like a C++ object, but having the virtual dispatch table separate from the object itself. – Neil Nov 25 '11 at 23:53
  • @Neil: yes - that is a valid variation. I alluded to it in the sentence '[i]nstead of using an enumeration, you might use a function pointer type which points to the correct converter.' – Jonathan Leffler Nov 26 '11 at 00:37
  • Sorry, I misunderstood you to mean a pointer to a function that handles the appropriate type of structure. – Neil Nov 26 '11 at 22:43
0

Well, the only difference between the two structs is the number of characters in the name member. If your struct held this as a char* (in memory) instead, what I think you want would work. When the id has been read you can malloc the appropriate size and read it in, and then the rest of the struct.

James
  • 9,064
  • 3
  • 31
  • 49
  • The example above is just to explane the problem. What I really want to do is to parse PE files (exe, dll, ..) that are given in a specific format with exactly the differences shown above between 32bit and 64bit files. So there's no way to modify the layout for easier parsing. – Peter F. Nov 24 '11 at 21:10
  • Are you worried about the invention of 128-bit files? What PE structures are you trying to read? – James Nov 24 '11 at 21:17
  • I am trying to read the optional header section where the imageBase address can be 32 or 64 bit width. – Peter F. Nov 24 '11 at 21:24
0

You can certainly do

int parse_any_type(void *x, int type){ 
  int id;
  char *name;
  int value1;
  int value2;
  int value3;
  int value4;    

  if(type == 1) {
     id   = ((struct typeA*)x)->id;
     name = ((struct typeA*)x)->name;
     /* ... */
  } else {
     id = ((struct typeB*)x)->id;
     /* ... */
  }

  printf("%i\n", id);
  printf("%s\n", name);
  printf("%i\n", value1);
  /* ... */
}

but it is a bit awkward and repetitive.

dmckee --- ex-moderator kitten
  • 98,632
  • 24
  • 142
  • 234
  • The repetition is what I am trying to save. When I have say 60 lines to cast 30 value and print them, I could just use these 60 lines to print the values. – Peter F. Nov 24 '11 at 21:28
0

you can use a union inside the the structure for the array, so it's size will be decided on assignment. however I don't know exactly how it will work with your packed attribute.

stdcall
  • 27,613
  • 18
  • 81
  • 125
0

Each member access needs to know the structure layout. And because you don't know which structure you're using ahead of time, you're going to have to duplicate code in some form in order to handle both layouts. However if you have a known number of structures you can hide the nitty-gritty behind a macro:

#define MEMBER(_ptr, _type, _name) ((_type)?((A*)_ptr)->_name:((B*)_ptr)->_name)

printf("%i\n", MEMBER(a, type, value1));
Neil
  • 54,642
  • 8
  • 60
  • 72
0

I find some of the answers overly complicated.

From my point of view, the simplest way for this problem is to use a union in your data structure. For the one you provided in your example, it should be something like:

struct typeU
{
    int id;
    int name_len;

    union
    {
        char _16[16];
        char _32[32];
    } name;

    int value1;
    int value2;
    int value3;
    int value4;
} __attribute__ ((packed));

The print function, would be something like:

void typeU_print(struct typeU *t)
{
    printf("%i\n", t->id);

    switch (t->name_len)
    {
        case 16:
            printf("%s\n", t->name._16);
        break;

        case 32:
            printf("%s\n", t->name._32);
        break;
    }

    printf("%i\n", t->value1);
    printf("%i\n", t->value2);
    printf("%i\n", t->value3);
    printf("%i\n", t->value4);
}
Quetzy Garcia
  • 1,820
  • 1
  • 21
  • 23