C How fread reads different data blocks in a binary file?

Question

I'm porting some C code to C#, but I know little about C, but I'm flexible and I can learn new programming languages. Anyway I wasn't able to figure out the exact behaviour from the code I'm porting.

I've read about fread() and on the web.

  fread(&(targetObj->data), sizeof(TestObj), 1, file);

Now, file is a big binary file with lots of data in it.

What I want to know is how I can do this in C#. Let me explain:

I think that line of code does this:

TestObj is an unsigned short
reads 1 time a chunk of data of the size of TestObj(unsigned short)
reads it from file (which is pointer to a binary file on filesystem) into targetObj->data

What I don't understand is:

I have a big binary file, what it actually reads? There are somewhere headers which define where an unsigned short sized chunk of data is written?

Where does it takes from the binary that object? How can I know how to read back from the binary file in C#? Maybe C knows where to pick that single unsigned short, but I don't in C#

For example if that binary file has saved in it 40 unsigned shorts the C code line above reads just the first one?

and if I do

fread(&(targetObj->data), sizeof(TestObj), 5, file);

it is expected that testObj->data is an array of 5 unsigned shorts? And the code will read the first 5 unsigned shorts that it finds in the whole binary file?

I can't wrap my head around this but I need to know how C recognizes that unsigned short in a big binary file which I don't know the content of nor I can't think how I can say in C# read the first C unsigned short from that file

Had a good laugh :) I thought "threads" was misspelled but learned something new! — Zimano, Oct 02 '17 at 07:18
Is that a question about `fread`, or about what a binary file is? It is a raw sequence of bytes (and their interpretation or parsing is the job of the application). You need to *document* your file format! — Basile Starynkevitch, Oct 02 '17 at 07:50
@Zimano first let's understand fread, then we will talk about threads :D — Liquid Core, Oct 02 '17 at 08:29
@BasileStarynkevitch It was a question on how fread works. File format is not mine, I'm porting pre existing code with little or no documentation, so that's all I can do about the file format :P — Liquid Core, Oct 02 '17 at 08:31

score 2 · Accepted Answer · answered Oct 02 '17 at 07:13

2

fread just reads the specified number of bytes from the current file cursor position, and advances the file cursor (or "file pointer", but not to be confused with a C pointer).

So if sizeof(TestObj) is 2, it will read two bytes and place them into the location pointed by &(targetObj->data), with no bounds checking, and regardless of any differences between your architecture endianess and the file protocol endianess. Note that this approach is not a platform-independent way of parsing files containing numbers in binary form, since the number might be stored differently on your machine, compared to how it is stored inside the file (by whoever designed the binary protocol you are trying to read).

In C#, you might achieve a similar thing by manually specifying struct packing and field placement, although the code will suffer from the same problems as your C code.

answered Oct 02 '17 at 07:13

vgru

49,838
16
120
201

So basically, to be able to understand the file structure without going with trial and error and shots in the dark, I should install some C ide/compiler and be able to debug the actual non ported code to see what happens and what values are returned to the binary file, then reconstruct the file format back in C# code. Right? – Liquid Core Oct 02 '17 at 08:55
Well, if you have the actual C code, you might be able to port it to C# without a debugger (until the moment you make a mistake while porting and get the need to debug), you just need to be careful to ensure that struct packing/padding is ported properly to C#, if the C code reads entire structs in one go. My point was that the C code itself is not portable, meaning that it wouldn't work if your machine is little endian, but the protocol is big endian. If the C code is working on your machine, then the C# code will work also. – vgru Oct 02 '17 at 09:10
So, depending on your C compiler settings, C might pad all struct members to nearest multiples of `int`, inserting bytes as needed, or it might pack the members tightly, to make the struct smaller. C# will do whatever it likes, unless you specify the packing manually using `StructLayout`/`FieldOffset`. [This thread](https://stackoverflow.com/q/2384/69809) has some usual ways of doing it. – vgru Oct 02 '17 at 09:16
Finally, if you are unsure how the C code works just by inspecting it, and you don't have the protocol specs, then you might need to use the debugger. After unit testing the code, I would definitely create functional tests, i.e. take some existing binary files with known output, and check if the C# code produces the same result. – vgru Oct 02 '17 at 09:18
It's very hard as I'm on Windows and that C code is for linux, I think the StructLayout will be very hard given that I don't really know what that file contains and I don't know what expect. I think it will be very annoying also because the code isn't very clear on what to do with those values – Liquid Core Oct 02 '17 at 10:25
Well I'll try to marshal everything manually and see what happens, thanks for the advices, at least I understood how it works – Liquid Core Oct 03 '17 at 15:17

score 1 · Answer 2 · answered Oct 02 '17 at 07:10

1

fread reads from current position in stream see also ftell and fseek. Equivalent in C# would be Stream.Read

answered Oct 02 '17 at 07:10

Nahuel Fouilleul

18,726
2
31
36

score 0 · Answer 3 · answered Oct 02 '17 at 07:07

From man fread

 size_t
 fread(void *restrict ptr, size_t size, size_t nitems, FILE *restrict stream);

 The function fread() reads nitems objects, each size bytes long, from the stream pointed to by stream, storing them at the location given by ptr.

sizeof(short) is resolved by compiler, as per https://stackoverflow.com/a/14171152/6204612

And C does not do any pretty conversions from you. What is read is precisely sizeof(short) bytes, and these bytes are put into TestObj variable. Whether it is correct or not is implementer's responsibility. You need to manage offsets, collection sizes etc. on your own.

C How fread reads different data blocks in a binary file?

3 Answers3