Can someone explain this definition of the 'dirent' struct in Solaris?

Question

Recently I was looking at the 'dirent' structure (in dirent.h) and was a little puzzled by its definition.

NOTE: This header file is from a Solaris machine at my school.


typedef struct dirent {
    ino_t       d_ino;
    off_t       d_off;
    unsigned short  d_reclen;
    char        d_name[1];
} dirent_t;

Particularly the d_name field. How would this work in the operating system? If you need to store a null terminated string what good is an array of a single char? I know that you can get the address of an array by its first element but I am still confused. Obviously something is happening, but I don't know what. On my Fedora Linux system at home this field is simply defined as:

char d_name[256];

Now that makes a lot more sense for obvious reasons. Can someone explain why the Solaris header file defines the struct as it does?

score 10 · Accepted Answer · answered Feb 18 '09 at 23:58

As others have pointed out, the last member of the struct doesn't have any set size. The array is however long the implementation decides it needs to be to accommodate the characters it wants to put in it. It does this by dynamically allocating the memory for the struct, such as with malloc.

It's convenient to declare the member as having size 1, though, because it's easy to determine how much memory is occupied by any dirent variable d:

sizeof(dirent) + strlen(d.d_name)

Using size 1 also discourages the recipient of such struct values from trying to store their own names in it instead of allocating their own dirent values. Using the Linux definition, it's reasonable to assume that any dirent value you have will acept a 255-character string, but Solaris makes no guarantee that its dirent values will store any more characters than they need to.

I think it was C 99 that introduced a special case for the last member of a struct. The struct could be declared like this instead:

typedef struct dirent {
  ino_t d_ino;
  off_t d_off;
  unsigned short d_reclen;
  char d_name[];
} dirent_t;

The array has no declared size. This is known as the flexible array member. It accomplishes the same thing as the Solaris version, except that there's no illusion that the struct by itself could hold any name. You know by looking at it that there's more to it.

Using the "flexible" declaration, the amount of memory occupied would be adjusted like so:

sizeof(dirent) + strlen(d.d_name) + 1

That's because the flexible array member does not factor in to the size of the struct.

The reason you don't see flexible declarations like that more often, especially in OS library code, is likely for the sake of compatibility with older compilers that don't support that facility. It's also for compatibility with code written to target the current definition, which would break if the size of the struct changed like that.

Actually, the entry d_reclen holds the real size of this struct instance, you do not have to compute it yourself. — raimue, Feb 19 '09 at 02:43
Ah, you're right. The one allocating the struct still needs to figure out how much to allocate, though, so it's still nice to have an easy way to calculate it. — Rob Kennedy, Feb 19 '09 at 04:42
I returned to this issue when implementing my own ext2 driver for my hobby OS. The key for me is the way I am using this structure. I read directory entries off the disk, then cast the buffer to a `dirent_t`. The flexible array member then corresponds to the file name characters. Then simply read `name_len` characters starting at d.name and that's that. It actually works quite nice in practice. You probably need to understand how rec_len is used to see why it makes sense. — Mr. Shickadance, Sep 29 '11 at 15:07

score 5 · Answer 2 · answered Feb 18 '09 at 22:10

5

The dirent struct will be immediately followed in memory by a block of memory that contains the rest of the name, and that memory is accessible through the d_name field.

answered Feb 18 '09 at 22:10

Rob K

8,757
2
32
36

score 5 · Answer 3 · answered Feb 18 '09 at 22:11

5

This is a pattern used in C to indicate an arbitrary-length array at the end of a structure. Arrays in C have no built-in bounds checking, so when your code tries to access the string starting at d_name, it will continue past the end of the structure. This relies on readdir() will allocate enough memory to hold the entire string plus the terminating nul.

answered Feb 18 '09 at 22:11

Commodore Jaeger

32,280
4
54
44

Why not just use a pointer in the struct though? To save a few bytes? I suppose at the OS level that may be the case. – Mr. Shickadance Feb 18 '09 at 22:49
1

A pointer doesn't do the same thing. To use a pointer would require multiple memory allocations -- one for the dirent structure and one for the name, with the dirent pointing to the name. Using the single-byte-array pattern means one single allocation with d_name being the first byte of the name. – Andrew Feb 18 '09 at 23:14
Got ya. Thanks for the clarification, I wasn't considering the extra allocation needed. – Mr. Shickadance Feb 19 '09 at 00:50

score 1 · Answer 4 · answered Feb 18 '09 at 22:12

It looks like a micro-optimization to me. Names are commonly short, so why allocate space that you know will go unused. Also, Solaris may support names longer than 255 characters. To use such a struct you just allocate the needed space and ignore the supposed array size.

Can someone explain this definition of the 'dirent' struct in Solaris?

4 Answers4

Linked