11

I'm trying to write a tool that will take as input some C code containing structs. It will compile the code, then find and output the size and offset of any padding the compiler decides to add to structs within it. This is pretty straightforward to do by hand for a known struct using offsetof, sizeof, and some addition, but I can't figure out an easy way to do it automatically for any input struct.

If I knew how to iterate through all elements in a struct, I think I could get the tool written with no problems, but as far as I know there's no way to do that. I'm hoping some StackOverflow people will know a way. However, I'm not stuck in my approach, and I'm certainly open to any alternate approaches to finding padding in a struct.

toolic
  • 57,801
  • 17
  • 75
  • 117
Desert ed
  • 155
  • 1
  • 8
  • 1
    I don't understand what you're asking for. Do you want to build a generic reflection system for the C language ? Wanting reflection in C is a bit like wanting to cross the Atlantic Ocean with a motorbike... – Alexandre C. Jul 20 '10 at 20:22
  • Sorry if I wasn't clear in my question. I'm not looking for a generic reflection system (I hope!). I figured my solution would involve parsing the C source code with Perl and generating some modified C code with a sizeof and offsetof call for each element in the struct. That would provide the size and location of all the elements in the struct, and from that it's trivial to find and report any padding the struct contains. Does that seem like a reasonable approach, or would you go about the problem a different way? – Desert ed Jul 20 '10 at 20:45

10 Answers10

7

Isn't this what pahole does?

ninjalj
  • 42,493
  • 9
  • 106
  • 148
  • 2
    It is indeed *exactly* what `pahole` does, although it operates on binaries compiled with debugging information rather than source. – caf Jul 20 '10 at 23:02
4

Say you have the following module.h:

typedef void (*handler)(void);

struct foo {
  char a;
  double b;
  int c;
};

struct bar {
  float y;
  short z;
};

A Perl program to generate unpack templates begins with the customary front matter:

#! /usr/bin/perl

use warnings;
use strict;

sub usage { "Usage: $0 header\n" }

With structs, we feed the header to ctags and from its output collect struct members. The result is a hash whose keys are names of structs and whose values are arrays of pairs of the form [$member_name, $type].

Note that it handles only a few C types.

sub structs {
  my($header) = @_;

  open my $fh, "-|", "ctags", "-f", "-", $header
    or die "$0: could not start ctags";

  my %struct;
  while (<$fh>) {
    chomp;
    my @f = split /\t/;
    next unless @f >= 5 &&
                $f[3] eq "m" &&
                $f[4] =~ /^struct:(.+)/;

    my $struct = $1;
    die "$0: unknown type in $f[2]"
      unless $f[2] =~ m!/\^\s*(float|char|int|double|short)\b!;

    # [ member-name => type ]
    push @{ $struct{$struct} } => [ $f[0] => $1 ];
  }

  wantarray ? %struct : \%struct;
}

Assuming that the header can be included by itself, generate_source generates a C program that prints offsets to the standard output, fills structs with dummy values, and writes the raw structures to the standard output preceded by their respective sizes in bytes.

sub generate_source {
  my($struct,$header) = @_;

  my $path = "/tmp/my-offsets.c";
  open my $fh, ">", $path
    or die "$0: open $path: $!";

  print $fh <<EOStart;
#include <stdio.h>
#include <stddef.h>
#include <$header>
void print_buf(void *b, size_t n) {
  char *c = (char *) b;
  printf("%zd\\n", n);
  while (n--) {
    fputc(*c++, stdout);
  }
}

int main(void) {
EOStart

  my $id = "a1";
  my %id;
  foreach my $s (sort keys %$struct) {
    $id{$s} = $id++;
    print $fh "struct $s $id{$s};\n";
  }

  my $value = 0;
  foreach my $s (sort keys %$struct) {
    for (@{ $struct->{$s} }) {
      print $fh <<EOLine;
printf("%lu\\n", offsetof(struct $s,$_->[0]));
$id{$s}.$_->[0] = $value;
EOLine
      ++$value;
    }
  }

  print $fh qq{printf("----\\n");\n};

  foreach my $s (sort keys %$struct) {
    print $fh "print_buf(&$id{$s}, sizeof($id{$s}));\n";
  }
  print $fh <<EOEnd;
  return 0;
}
EOEnd

  close $fh or warn "$0: close $path: $!";
  $path;
}

Generate a template for unpack where the parameter $members is a value in the hash returned by structs that has been augmented with offsets (i.e., arrayrefs of the form [$member_name, $type, $offset]:

sub template {
  my($members) = @_;

  my %type2tmpl = (
    char => "c",
    double => "d",
    float => "f",
    int => "i!",
    short => "s!",
  );

  join " " =>
  map '@![' . $_->[2] . ']' . $type2tmpl{ $_->[1] } =>
  @$members;
}

Finally, we reach the main program where the first task is to generate and compile the C program:

die usage unless @ARGV == 1;
my $header = shift;

my $struct = structs $header;
my $src    = generate_source $struct, $header;

(my $cmd = $src) =~ s/\.c$//;
system("gcc -I`pwd` -o $cmd $src") == 0
  or die "$0: gcc failed";

Now we read the generated program's output and decode the structs:

my @todo = map @{ $struct->{$_} } => sort keys %$struct;

open my $fh, "-|", $cmd
  or die "$0: start $cmd failed: $!";
while (<$fh>) {
  last if /^-+$/;
  chomp;
  my $m = shift @todo;
  push @$m => $_;
}

if (@todo) {
  die "$0: unfilled:\n" .
      join "" => map "  - $_->[0]\n", @todo;
}

foreach my $s (sort keys %$struct) {
  chomp(my $length = <$fh> || die "$0: unexpected end of input");
  my $bytes = read $fh, my($buf), $length;
  if (defined $bytes) {
    die "$0: unexpected end of input" unless $bytes;
    print "$s: @{[unpack template($struct->{$s}), $buf]}\n";
  }
  else {
    die "$0: read: $!";
  }
}

Output:

$ ./unpack module.h 
bar: 0 1
foo: 2 3 4

For reference, the C program generated for module.h is

#include <stdio.h>
#include <stddef.h>
#include <module.h>
void print_buf(void *b, size_t n) {
  char *c = (char *) b;
  printf("%zd\n", n);
  while (n--) {
    fputc(*c++, stdout);
  }
}

int main(void) {
struct bar a1;
struct foo a2;
printf("%lu\n", offsetof(struct bar,y));
a1.y = 0;
printf("%lu\n", offsetof(struct bar,z));
a1.z = 1;
printf("%lu\n", offsetof(struct foo,a));
a2.a = 2;
printf("%lu\n", offsetof(struct foo,b));
a2.b = 3;
printf("%lu\n", offsetof(struct foo,c));
a2.c = 4;
printf("----\n");
print_buf(&a1, sizeof(a1));
print_buf(&a2, sizeof(a2));
  return 0;
}
Greg Bacon
  • 134,834
  • 32
  • 188
  • 245
  • Thank you for a great explanation and a stand alone program without many dependencies. Does this handle nested structs? – Desert ed Jul 21 '10 at 15:58
  • @Desert ed As written, it handles flat structures with members of only a few intrinsic types. Handling nested structs wouldn't take much: `template` would need to call itself when it sees a type that is another struct, and of course `structs` would need to be more lenient in the types that it accepts. – Greg Bacon Jul 21 '10 at 18:28
  • wow, longest answer in a question that's not one of those "What your favorite ___?" – mrk Oct 24 '12 at 01:13
3

I prefer to read and write into a buffer, then have a function load the structure members from the buffer. This is more portable than reading directly into a structure or using memcpy. Also this algorithm frees up any worry about compiler padding and can also be adjusted to handle Endianess.

A correct and robust program is worth more than any time spent compacting binary data.

Thomas Matthews
  • 56,849
  • 17
  • 98
  • 154
  • Unfortunately I don't have any say in the existing program this is designed to help, so I can't just use buffers. My goal is to minimize the errors introduced when data structures change. Presently, a change to the data structure changes the byte offset of all following struct members in a non-predictable way. I'm just trying to add some predictability there. – Desert ed Jul 20 '10 at 20:33
2

Have your tool parse the struct definition to find the names of the fields, then generate C code that prints a description of the struct padding, and finally compile and run that C code. Sample Perl code for the second part:

printf "const char *const field_names[] = {%s};\n",
       join(", ", map {"\"$_\""} @field_names);
printf "const size_t offsets[] = {%s, %s};\n",
       join(", ", map {"offsetof(struct $struct_name, $_)"} @field_names),
       "sizeof(struct $struct_name)";
print <<'EOF'
for (i = 0; i < sizeof(field_names)/sizeof(*field_names); i++) {
    size_t padding = offsets[i+1] - offsets[i];
    printf("After %s: %zu bytes of padding\n", field_names[i], padding);
}
EOF

C is very difficult to parse, but you're only interested in a very small part of the language, and it sounds like you have some control over your source files, so a simple parser should do the trick. A search of CPAN turns up Devel::Tokenizer::C and a few C:: modules as candidates (I know nothing about them other than their names). If you really need an accurate C parser, there is Cil, but you have to write your analysis in Ocaml.

Gilles 'SO- stop being evil'
  • 104,111
  • 38
  • 209
  • 254
2

Hack up Convert::Binary::C.

daxim
  • 39,270
  • 4
  • 65
  • 132
2

You could use Exuberant Ctags to parse your source files instead of using a CPAN module or hacking something up yourself. For instance, for the following code:

typedef struct _foo {
    int a;
    int b;
} foo;

ctags emits the following:

_foo    x.c     /^typedef struct _foo {$/;"     s                               file:
a       x.c     /^    int a;$/;"                m       struct:_foo             file:
b       x.c     /^    int b;$/;"                m       struct:_foo             file:
foo     x.c     /^} foo;$/;"                    t       typeref:struct:_foo     file:

The first, fourth, and fifth columns should be enough for you to determine what struct types exist and what their members are. You could use that information to generate a C program that determines how much padding each struct type has.

Russell Silva
  • 2,772
  • 3
  • 26
  • 36
2

You might try pstruct.

I've never used it, but I was looking for some way you might be able to use stabs and this sounds like it would fit the bill.

If it doesn't, I would suggest looking at other ways to parse out stabs info.

daxim
  • 39,270
  • 4
  • 65
  • 132
bstpierre
  • 30,042
  • 15
  • 70
  • 103
  • pstruct is a good suggestion, and works for some basic C files I've thrown at it. It also has the advantage of being already installed on most machines. It's having some trouble with nested structs though. – Desert ed Jul 21 '10 at 21:08
1

If you have access to Visual C++, you can add the following pragma to have the compiler spit out where and how much padding was added:

#pragma warning(enable : 4820) 

At that point you can probably just consume the output of cl.exe and go party.

MSN
  • 53,214
  • 7
  • 75
  • 105
0

There is no C++ language feature to iterate through the members of a struct, so I think you're out of luck.

You might be able to cut down some of the boiler-plate with a macro, but I think you're stuck specifying all the members explicitly.

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
-1

I don't believe that any general-purpose facility exists for introspection/reflection in C. That's what Java or C# are for.

Steven Sudit
  • 19,391
  • 1
  • 51
  • 53
  • It was likely downvoted because it didn't contribute to the question. Just because there is no introspection doesn't mean there's no way to find the padding, as the many helpful answers (and my working program) show. – Desert ed Jul 27 '10 at 20:26
  • @Desert: That turns out not to be the case. It was one of thirteen answers downvoted within seconds of each other by a user who I offended. – Steven Sudit Jul 27 '10 at 20:43