-5

I am trying to serialize PHP opcode. I am confused with code. Please check following code. What is the meaning of the code.

(ptr) = (void*)((char*)(ptr) - (char*)script->mem)

https://github.com/php/php-src/blob/6aaab9adf7619c121c19701022aeb8d88f9c3bab/ext/opcache/zend_file_cache.c#L112

How to serialize op_array ?

Arshid KV
  • 9,631
  • 3
  • 35
  • 36

2 Answers2

7

So, this is an old game coders trick, intended to serialise/deserialise a pointer to/from disk.

It's a little dirty, but lets see if I can explain somehow. Using a vastly oversimplified example, lets imagine I have this struct:

struct FileContents
{
  char text[10] = {0,1,2,3,4,5,6,7,8,9};
  char* ptr = text + 5; //< point to element 5
};

and I want to read/write that struct using fread/fwrite in one go. If i was to simply do:

void writeFile(FileContents contents)
{
  FILE* fp = fopen("blah.dat", "wb");
  fwrite(&contents, sizeof(FileContents), 1, fp);
  fclose(fp);
}

This would work fine for the values stored in contents.text, but would horribly fail for contents.ptr (since this pointer is referring to a memory address, and it's unlikely we will be able to re-claim that same memory location if we wanted to read the data again).

As such, we need an unfix/refix operation on all the pointer values. We can achieve this by doing:

struct FileContents
{
  char text[10] = {0,1,2,3,4,5,6,7,8,9};
  void* ptr = (text + 5); //< point to element 5

  // convert 'ptr' to be an integer offset from the start of the struct
  void unfix()
  {
    // heres the first byte we will write to the file
    char* startOfFile = (char*)this;

    // here's the problematic pointer value.
    char* ptrValue = ptr;

    // now lets compute a byte offset from the start of the struct,
    // to the memory location ptr is pointing to... 
    // (in this case, the offset will be 5)
    size_t offset = ptrValue - startOfFile;

    // now lets modify the value of ptr so that it now stores a byte 
    // offset, rather than a memory location. (We need to cast the 
    // offset to a pointer value, otherwise this wont work)
    ptr = (void*)offset;
  }

  // AFTER reading the file (deserialise), we need to convert
  // that integer offset back into a valid memory address...
  void refix()
  {
    // grab start of struct in memory
    char* startOfFile = (char*)this;

    // and add the offset to the start of the file, to
    // get the valid memory location
    ptr = startOfFile + ((size_t)ptr);
  }  
};
robthebloke
  • 9,331
  • 9
  • 12
  • 1
    Be wary of using C++ notations in answers to C questions. Your initializers in the structure and your inclusion of a function definition within the scope of a structure definition are both C++ rather than C. – Jonathan Leffler Jan 03 '20 at 06:37
  • Thank you... What is the new method to write complex struct (op_array) to disk ? – Arshid KV Jan 03 '20 at 07:12
  • This will only work for pointers to memory addresses within the struct, right? – LegendofPedro Jan 04 '20 at 05:55
  • 1
    @LegendofPedro That's correct. It's basically a way to load an entire asset off disk into a single memory block, and still be able to use pointers within that data to provide quick access (instead of having to compute those offsets and addresses). – robthebloke Jan 05 '20 at 22:45
  • @ArshidKV I'm not sure of the context in the code linked with this question, but in general writing this data to disk tends to be very standard (i.e. sometimes writing a struct member at a time). Once the data has been written to disk, it's from that point onwards that this technique becomes useful. (Its quick to load from disk, the data is contiguous [which helps the prefetcher], and it's trivial to move assets around in memory - just unfix/refix before/after memcpy). – robthebloke Jan 05 '20 at 22:56
  • 1
    This technique only tends to be helpful for file data that is constant when loaded, doesn't need to be re-evaluated often, and would be nice to have quick load & lookup times. It looks as though the code you have linked to forms some form of program database for a debugger (or for an interpreter)? That kinda makes sense. Generate the pdb once when linking, and then you have a very fast to load/query data structure for use by the debugger (or interpreter) – robthebloke Jan 05 '20 at 23:01
  • Please check https://stackoverflow.com/questions/59463793/how-to-save-nested-c-struct-data-to-disk – Arshid KV Jan 06 '20 at 08:12
2

it seems that you have a void* ptr and sometype* script->mem, you cast them both to char* and subtract one from another, then you cast the result to void*

bobra
  • 615
  • 3
  • 18