-2

[Update 2016.03.17] Sorry for that I skipped the error checking for simplicity. I had check the errors, here is the complete code.

#define MAX_DATA 512
#define MAX_ROWS 100
typedef struct Database {
    Address rows[MAX_ROWS];
    int num; // Number of record in the DB.
} Database;

typedef struct Address {
    int id;
    int set;
    char name[MAX_DATA];
    char email[MAX_DATA];
} Address;

Database *db_ptr;
FILE *file;
int return_value;
// Write to file
db_ptr = (Database*)malloc(sizeof(Database));
if(!db_ptr) {
    printf("Memory error!\n");
}
for(int i = 0; i < MAX_ROWS; i++) {
    db_ptr->rows[i].id = i;
    db_ptr->rows[i].set = 0;
}
char *filename = "test.db";
file = fopen(filename, "w");
if(!file) {
    printf("Open(w) %s fail!\n", filename);
}
return_value = fwrite(db_ptr, sizeof(Database), 1, file);
printf("The return value from fwrite = %d\n", return_value);
free(db_ptr);
fclose(file);
// Read from file
db_ptr = (Database*)malloc(sizeof(Database));
if(!db_ptr) {
    printf("Memory error!\n");
}
file = fopen(filename, "r+");
if(!file) {
    printf("Open(r+) %s fail!\n", filename);
}
return_value = fread(db_ptr, sizeof(Database), 1, file);
printf("(1)The return value from fread = %d\n", return_value);
rewind(file);
return_value = fread(db_ptr, 1, sizeof(Database), file);
printf("(2)The return value from fread = %d\n", return_value);
printf("Sizeof(Database) = %lu\n", sizeof(Database));
free(db_ptr);
fclose(file);

The results are

"The return value from fwrite = 1"
"(1)The return value from fread = 1"
"(2)The return value from fread = 103204"
"Sizeof(Database) = 103204"

in Ubuntu15.04(64-bits) using gcc with -std=c99, and

"The return value from fwrite = 1"
"(1)The return value from fread = 0"
"(2)The return value from fread = 26832"
"Sizeof(Database) = 103204"

in Windows7(64-bits) using MinGW with -std=c99.

The size of the test.db are 103204 bytes in Ubuntu and 103205 bytes in Windows. It seems fail to read the whole structure of Database.

My question is that how this program has different behavior in different environment?

mingpepe
  • 489
  • 5
  • 10

2 Answers2

3

C structs are representations of data that are intended to be used only in memory. As soon as your data "leaves" memory in any way, you need to employ marshalling (or often also called serialization, the differences are not worth discussing here).

In order to do so, you need to convert your data into a format suitable for the medium you transfer it to, the disk in this case.

A struct is not suitable because of mostly these issues:

  • Padding. Structures are padded to align members to addresses where they can be accessed faster, or to make access even possible (on some architectures).
  • Endianness. The way data that is longer than a single byte is store. This varies between architectures.

I've written an answer with C code example that clearly addresses these issues, see here.

For this reasons, the whole approach using fread and fwrite in this situation is flawed. You can employ this style of data storage to disk to temporarily while your program is still running. As soon as the data could be shared / exchanged with different systems (versions of the same system, different machines, operating systems, libraries, ...) or is stored between consecutive runs of your program you need to employ correct marshalling.

Concerning your specific case: The call to fread probably returns 0 because it is not able to read a complete item. sizeof(Database) might be different between these environments.

Though this is only guessing, because you seem to have no error checking, especially for opening the files, in place. Also you could take a look at what errno (consider using strerror) provides.

A very good point brought up by user3121023 and Peter is the open mode: On Windows, there are some transformations done to the data you read from or write to a file if the file isn't opened in a so called binary mode. This, for example, means that if your data contains a byte equal to '\n', then Windows will add an additional '\r' (carriage return).

Community
  • 1
  • 1
Daniel Jour
  • 15,896
  • 2
  • 36
  • 63
  • 1
    All good information. But it is not clear how that explains `fread` returning 0 after the `fwrite`. Could you address that in your answer? – kaylum Mar 16 '16 at 09:56
  • @kaylum Good point, tried to address this, though this is highly guessing since any form of error checking seems to be missing. – Daniel Jour Mar 16 '16 at 10:02
2

Firstly, you need to open the file for binary I/O (include a 'b' character in the mode string given to fopen()). Without that, the file is opened in text mode which, among other things, translates characters like newlines differently between systems.

Second, and more significant, is hinted by the fact you need to use sizeof(). The result yielded by sizeof is implementation defined, for anything for the char types which have size 1 by definition. sizeof int is implementation defined. sizeof also yields different values for struct types, since there may be padding between members of different types, to meet alignment requirements for all the struct members. A consequence of these things being implementation defined, in the C standard, is that they vary between implementations (i.e. between compilers and host systems).

That explains why you are getting different sizes between systems - different amounts of data are being written or read by each operation, on different systems.

To deal with those things, you either need to use techniques to ensure everything is the same size (which is often possible with care, but difficult because some of the properties of some types (like int and struct padding) can be changed by compilation options on some compilers) or do some marshalling (translate the struct types in memory to some consistent binary format for output)and unmarshelling (the reverse process). The input process needs to be the reverse of the output process (suck in data from file as array of characters, and interpret the data to reconstruct your struct).

Even if two systems have the same sizes (for everything), there are concerns of how basic types (like int) are represented in memory e.g. endianness. That also needs to be handled with the method of input and output.

One simpler way is to use text mode files and formatted I/O. That has the advantage of the files being (largely) transportable, but also means a larger file.

Peter
  • 35,646
  • 4
  • 32
  • 74
  • Good point about formatted IO. This has also the benefit that the stored data is (to some extend) readable without using the program. – Daniel Jour Mar 16 '16 at 10:21
  • Open the file for binary I/O works! But the results of sizeof(Database) are the same in this code. I am still curious about what is the reason for different behaviors in different environments – mingpepe Mar 17 '16 at 01:45
  • Look up the meaning that terms like "implementation defined" have in the C standards. – Peter Mar 17 '16 at 10:36