8

I was wanting a simple string table that will store a bunch of constants and I thought "Hey! Lua does that, let me use some of there functions!"

This is mainly in the lstring.h/lstring.c files (I am using 5.2)

I will show the code I am curious about first. Its from lobject.h

/*
** Header for string value; string bytes follow the end of this structure
*/
typedef union TString {
  L_Umaxalign dummy;  /* ensures maximum alignment for strings */
  struct {
    CommonHeader;
    lu_byte reserved;
    unsigned int hash;
    size_t len;  /* number of characters in string */
  } tsv;
} TString;


/* get the actual string (array of bytes) from a TString */
#define getstr(ts)  cast(const char *, (ts) + 1)

/* get the actual string (array of bytes) from a Lua value */
#define svalue(o)       getstr(rawtsvalue(o))

As you see, the data is stored outside of the structure. To get the byte stream, you take the size of TString, add 1, and you got the char* pointer.

Isn't this bad coding though? Its been DRILLED into m in my C classes to make clearly defined structures. I know I might be stirring a nest here, but do you really lose that much speed/space defining a structure as header for data rather than defining a pointer value for that data?

Paul Bruner
  • 435
  • 4
  • 8
  • Note that it wouldn't be that difficult to just use the public Lua API to access a table of your strings. That way you don't need to reach too far under the hood. But that said, I fully understand the desire to learn why they implemented it the way they did. – RBerteig Jan 25 '12 at 20:31

3 Answers3

5

The idea is probably that you allocate the header and the data in one big chunk of data instead of two:

TString *str = (TString*)malloc(sizeof(TString) + <length_of_string>);

In addition to having just one call to malloc/free, you also reduce memory fragmentation and increase memory localization.

But answering your question, yes, these kind of hacks are usually a bad practice, and should be done with extreme care. And if you do, you'll probably want to hide them under a layer of macros/inline functions.

rodrigo
  • 94,151
  • 12
  • 143
  • 190
2

As rodrigo says, the idea is to allocate the header and string data as a single chunk of memory. It's worth pointing out that you also see the non-standard hack

struct lenstring {
  unsigned length;
  char data[0];
};

but C99 added flexible array members so it can be done in a standard compliant way as

struct lenstring {
  unsigned length;
  char data[];
};

If Lua's string were done in this way it'd be something like

typedef union TString {
  L_Umaxalign dummy;
  struct {
    CommonHeader;
    lu_byte reserved;
    unsigned int hash;
    size_t len;
    const char data[];
  } tsv;
} TString;

#define getstr(ts) (ts->tsv->data)
Geoff Reedy
  • 34,891
  • 3
  • 56
  • 79
  • You could make the first hack portable by using `char data[1]` – sbk Jan 24 '12 at 00:27
  • 6
    Note: you're not likely to see lua shift to the new syntax for flexible arrays due to its requirement to be portable to a large range of compilers (some quite old). – Michael Anderson Jan 24 '12 at 00:27
  • Ah I just read http://stackoverflow.com/questions/4412749/are-flexible-array-members-valid-in-c about this. Its funny how much info you can find about a subject when you know the name for it. – Paul Bruner Jan 24 '12 at 16:01
  • 2
    @sbk Actually, it is portable in practice, but undefined in theory, as the behavior of accessing beyond the declared bounds of the array is undefined in the standard. – rodrigo Jan 24 '12 at 17:05
1

It relates to the complications arising from the more limited C language. In C++, you would just define a base class called GCObject which contains the garbage collection variables, then TString would be a subclass and by using a virtual destructor, both the TString and it's accompanying const char * blocks would be freed properly.

When it comes to writing the same kind of functionality in C, it's a bit more difficult as classes and virtual inheritance do not exist.

What Lua is doing is implementing garbage collection by inserting the header required to manage the garbage collection status of the part of memory following it. Remember that free(void *) does not need to know anything other than the address of the memory block.

#define CommonHeader GCObject *next; lu_byte tt; lu_byte marked

Lua keeps a linked list of these "collectable" blocks of memory, in this case an array of characters, so that it can then free the memory efficiently without knowing the type of object it is pointing to.

If your TString pointed to another block of memory where the character array was, then it require the garbage collector determine the object's type, then delve into its structure to also free the string buffer.

The pseudo code for this kind of garbage collection would be something like this:

GCHeader *next, *prev;
GCHeader *current = firstObject;

while(current)
{
    next = current->next;
    if (/* current is ready for deletion */)
    {
        free(current);

        // relink previous to the next (singly-linked list)
        if (prev)
            prev->next = next;
    }
    else
        prev = current; // store previous undeleted object
    current = next;
}
Nick Bedford
  • 4,365
  • 30
  • 36