1

How are pascal strings laid out in memory?

I read: http://www.freepascal.org/docs-html/ref/refsu12.html It says that strings are stored on the heap and reference counted. To figure out where the length and reference was stored, I created a string and did tests on it a lot:

type PInt = ^Integer;

var
    str: String;
begin
    str := 'hello';
    writeln(PInt(@str[1]) - (sizeof(integer) * 1)); //length
    writeln(PInt(@str[1]) - (sizeof(integer) * 2)); //reference count
end.

The first one prints the length and the second one prints the reference count. It does this perfectly fine and it works.

Now I tried to emulate the same thing in C:

Export char* NewCString()
{
    const char* hello_ptr = "hello";

    int length = strlen(hello_ptr);

    //allocate space on the heap for:  sizeof(refcount) + sizeof(int) + strlength
    char* pascal_string = (char*)malloc((sizeof(int) * 2) + length);

    *((int*)&pascal_string[0]) = 0; //reference count to 0.
    *((int*)&pascal_string[sizeof(int)]) = length;  //length of the string.

    strcpy(&pascal_string[sizeof(int) * 2], hello_ptr); //copy hello to the pascal string.

    return &pascal_string[sizeof(int) * 2]; //return a pointer to the data.
}

Export void FreeCString(char* &ptr)
{
    int data_offset = sizeof(int) * 2;
    free(ptr - data_offset);
    ptr = NULL;
}

Then in pascal I do:

var
    str: string;
begin
    str := string(NewCString());
    writeln(PInt(@str[1]) - (sizeof(integer) * 1)); //length - prints 5. correct.
    writeln(PInt(@str[1]) - (sizeof(integer) * 2)); //reference count - prints 1! correct.
   //FreeCString(str);  //works fine if I call this..
end.

The pascal code prints the length correctly and the reference count is increased by one due to the assignment. This is correct.

However, as soon as it is finished executing, it crashes badly! It seems to be trying to free the string/heap. If I call FreeCString myself, it works just fine! I'm not sure what is going on.

Any ideas why it crashes?

Brandon
  • 22,723
  • 11
  • 93
  • 186
  • 4
    You're confusing multiple versions of Pascal (Wirth/Turbo Pascal defined length in byte 0, everything after Delphi 2 introduced long strings doesn't unless they're declared as `ShortString`). You've listed four different languages in your tags. Instead, why don't you explain what you're actually trying to accomplish in the first place, and ask how to do that? Why it crashes is because you're making erroneous assumptions about things that aren't correct. – Ken White Jan 05 '14 at 06:04
  • There. I've narrowed down the languages. I'm trying to convert a c-style string into a pascal string without having to also pass the length as a parameter. – Brandon Jan 05 '14 at 06:09
  • What does "I'm trying to convert a c-stype string into a pascal string" mean? Delphi/Free Pascal can accept null terminated C-style strings perfectly well without a length parameter; it's done thousands of times in every single Windows application (via the WinAPI calls made). Once again, what are you actually trying to accomplish? – Ken White Jan 05 '14 at 06:13
  • `char* &ptr` is C++. Not sure whether to retag the question. – Potatoswatter Jan 05 '14 at 06:18
  • @KenWhite I tried doing it normally, I get access violation. I also tried using a PChar. That only prints the first character in my string. – Brandon Jan 05 '14 at 07:15
  • To start with, every Delphi/FPC allocation stores the size of the allocation. Pascal Runtime code might access it. – Marco van de Voort Jan 06 '14 at 15:37

2 Answers2

3
  1. "string" is an alias that can point to 3 different string types (shortstring,ansistring and unicodestring)
  2. ansistring and unicodestring changed layout going from FPC 2.6 to FPC 2.7.x+ (equal to Delphi 2007 to Delphi 2009)
  3. Any Delphi mem allocator must be able to tell the size of an allocated block. Usually this is done by putting the 32-bit size in the block.
  4. FreePascal and Delphi have pluggable memory allocators. The default Free Pascal manager is an own suballocator. To have it use (on *nix) whatever libc uses, use unit cmem as first unit in your main program.
  5. As ansistring and unicodestring are refcounted, using manual tricks you are responsible for maintaining the integrity of the ref count. Which includes maintaining Pascal ABI in this for the Pascal <-> C changeovers.

In short don't, and the rare case that you must, add a constructor and a destructor function to pascal, and do all allocation via that.

P.s. you may want to have a look at rtl/inc/astrings.inc P.s.2 on Windows it might be easiest to use COM compatible widestring (BSTR) for interlanguage string types.

Marco van de Voort
  • 25,628
  • 5
  • 56
  • 89
  • 1
    I fixed it by changing the reference count to -1. Making pascal think that the string is constant. Thus it never tries to "free" my strings and I get to handle that myself. Works fine now. I'll try what you said though. – Brandon Jan 06 '14 at 18:45
1

Just because the runtime system lays out strings a particular way in memory, doesn't mean that writing C code to duplicate that memory layout will work. String management may involve additional constraints or external data structures. To make a string compatible with FreePascal, use FreePascal's own library routines.

It sounds like FreePascal requires something besides free() happen when the refcount goes to zero, but it's likely impossible to tell what without some reverse engineering or digging into ABI specs.

Potatoswatter
  • 134,909
  • 25
  • 265
  • 421