4

Firstly apologies if this question has been asked before or has a glaring obvious solution that I cannot see. I have found a similar question however I believe what I am asking goes a little further than what was previously asked.

I have a structure as follows:

typedef struct {
        int id;
        char *title;
        char *body;
} journal_entry;

Q: How do I write and load the contents of a pointer to memory in C (not C++) without using fixed lengths?

Am I wrong in thinking that by writing title or body to file I would endup with junk data and not actually the information I had stored? I do not know the size that the title or body of a journal entry would be and the size may vary significantly from entry to entry.

My own reading suggests that I will need to dereference pointers and fwrite each part of the struct separately. But I'm uncertain how to keep track of the data and the structs without things becoming confused particularly for larger files. Furthermore if these are not the only items I intend to store in the file (for example I may wish to include small images later on I'm uncertain how I would order the file structure for convenience.

The other (possibly perceived) problem is that I have used malloc to allocate memory for the string for the body / entry when loading the data how will I know how much memory to allocate for the string when I wish to load the entry again? Do I need to expand my struct to include int body_len and int title_len?

Guidance or suggestions would be very gratefully received.

Community
  • 1
  • 1
Chortle
  • 171
  • 2
  • 11

3 Answers3

1

You are correct that storing this structure in memory is not a good idea, because once the strings to which your pointers point are gone, there is no way to retrieve them. From the practical point of view, one way is to declare strings of finite length (if you know that your strings have a length limit):

typedef struct {
        int id;
        char title[MAX_TITLE_LEGNTH];
        char body[MAX_BODY_LENGTH];
} journal_entry;

If you need to allocate title and body with malloc, you can have a "header" element that stores the length of the whole structure. When you write your structure to file, you would use this element to figure out how many bytes you need to read.

I.e. to write:

FILE* fp = fopen(<your-file-name>,"wb");
size_t size = sizeof(id)+strlen(title)+1+strlen(body)+1;
fwrite(&size, sizeof(size), 1, fp);
fwrite(&id, sizeof(id), 1, fp);
fwrite(title, sizeof(char), strlen(title)+1, fp);
fwrite(body, sizeof(char), strlen(body)+1, fp);
fclose(fp);

To read (not particularly safe implementation, just to give the idea):

FILE* fp = fopen(<your-file-name>,"rb");
size_t size;
int read_bytes = 0;
struct journal_entry je;
fread(&size, sizeof(size), 1, fp);
void* buf = malloc(size);
fread(buf, size, 1, fp);
fclose(fp);
je.id = *((int*)buf);  // might break if you wrote your file on OS with different endingness
read_bytes += sizeof(je.id)
je.title = (char*)(buf+read_bytes);
read_bytes +=  strlen(je.title)+1; 
je.body = (char*)(buf+read_bytes);
// other way would be to malloc je.title and je.body and destroy the buf
Ashalynd
  • 12,363
  • 2
  • 34
  • 37
1

(I am focusing on a Linux point of view, but it could be adapted to other systems)

Serialization

What you want to achieve is often called serialization (citing wikipedia) - or marshalling:

The serialization is the process of translating data structures or object state into a format that can be stored and reconstructed later in the same or another computer

Pointer I/O

It is in principle possible to read and write pointers, e.g. the %p conversion specification for fprintf(3) & fscanf(3) (and you might directly write and read a pointer, which is like at the machine level some intptr_t integer. However, a given address (e.g. 0x1234F580 ...) is likely to be invalid or have a different meaning when read again by a different process (e.g. because of ASLR).

Serialization of aggregate data

You might use some textual format like JSON (and I actually recommend doing so) or other format like YAML (or perhaps invent your own, e.g. inspired by s-exprs). It is a well established habit to prefer textual format (and Unix had that habit since before 1980) to binary ones (like XDR, ASN/1, ...). And many protocols (HTTP, SMTP, FTP, JSONRPC ....) are textual protocols

Notice that on current systems, I/O is much slower than computation, so the relative cost of textual encoding & decoding is tiny w.r.t. network or disk IO (see table of Answers here)

The encoding of a some aggregate data (e.g. a struct in C) is generally compositional, and by composing the encoding of elementary scalar data (numbers, strings, ....) you can encode some higher-level data type.

serialization libraries

Most formats (notably JSON) have several free software libraries to encode/decode them, e.g. Jansson, JsonCPP, etc..

Suggestion:

Use JSON and format your journal_entry perhaps into a JSON object like

{ "id": 1234,
  "title": "Some Title Here",
  "body": "Some body string goes here" }

Concretely, you'll use some JSON library and first convert your journal_entry into some JSON type (and vice versa), then use the library to encode/decode that JSON

databases

You could also consider a database approach (e.g. sqlite, etc...)


PS. Serialization of closures (or anything containing pointer to code) may be challenging. You'll need to define what exactly that means.

PPS. Some languages provide builtin support for serialization and marshalling. For example, Ocaml has a Marshal module, Python has pickle

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547
0

In memory you can store strings as pointers to arrays. But in a file on disk you would typically store the data directly. One easy way to do it would be to store a uint32_t containing the size, then store the actual bytes of the string. You could also store null-terminated strings in the file, and simply scan for the null terminator when reading them. The first method makes it easier to preallocate the needed buffer space when reading, without needed to pass over the data twice.

John Zwinck
  • 239,568
  • 38
  • 324
  • 436