2

I have a file where I'm trying to read each line into a struct in C to further work with it.

The file looks like this:

Bread,212,2.7,36,6,9.8,0.01,0.01,10,500 
Pasta,347,2.5,64,13,7,0.01,0.01,6,500 
Honey,340,0.01,83,0.01,0.01,0.01,0.01,22.7,425 
Olive-oil,824,92,0.01,0.01,0.01,0.01,13.8,35,500 
White-beans,320,2.7,44,21,18,0.01,0.01,11,400 
Flaxseed-oil,828,92,0.01,0.01,0.01,52,14,100,100 
Cereal,363,6.5,58,13,9.9,0.01,0.01,11,1000 
Hazelnuts,644,61.6,10.5,12,0.01,0.09,7.83,16.74,252 

So I wrote a for-loop to iterate over the lines in the file, trying to store each value into fields of a struct. I try to print the fields of the struct, but its already going wrong with the first argument, the string.

It is printing:

scanresult: 1, name:  ■B, kcal: 0.00, omega 3: 0.00, omega 6: 0.00, carb: 0.00, protein: 0.00, fib: 0.00, price: 0.00, weight: 0.00g

Scanres should be 10, not 1, and the values should match the ones of the first line of the file.

I have tried with or without whitespace in front of the argument in the formatted string. Also I tried compiler warnings -Wall and -pedantic. No issues found.

What else could cause this problem?

The code looks like this:

#include <stdio.h>

#define MAX_CHAR 100
#define SIZE_OF_SHELF 8

typedef struct {
    char name[MAX_CHAR];
    double kcal, fat, omega_3, omega_6, carb, protein, fib, price, weight;
} Food;

int main(void) {
    int i = 0, scanresult;
    Food Shelf[SIZE_OF_SHELF];
    FILE *fp;

    fp = fopen("foods.txt", "r");

    if (! fp) {
        printf("error loading file. bye.\n");
        return 0;
    }

    for (i = 0; !feof(fp); i++) {
        scanres = fscanf(fp, " %[^,],%lf,%lf,%lf,%lf,%lf,%lf,%lf,%lf,%lf ",
                         Shelf[i].name,
                         &Shelf[i].kcal, &Shelf[i].fat,
                         &Shelf[i].carb, &Shelf[i].protein,
                         &Shelf[i].fib, &Shelf[i].omega_3,
                         &Shelf[i].omega_6, &Shelf[i].price,
                         &Shelf[i].weight);
        
        printf("scanres: %d, name: %s, kcal: %.2f, omega 3: %.2f, omega 6: %.2f, carb: %.2f, protein: %.2f, fib: %.2f, price: %.2f, weight: %.2fg\n",
               scanres, Shelf[i].name, Shelf[i].kcal,
               Shelf[i].omega_3, Shelf[i].omega_6, Shelf[i].carb, 
               Shelf[i].protein, Shelf[i].fib, Shelf[i].price,
               Shelf[i].weight);
    }
    return 0;
}

If anybody can spot what I'm doing wrong, please let me know.

chqrlie
  • 131,814
  • 10
  • 121
  • 189
mcklmo
  • 69
  • 7
  • What editor did you use to make your text file? It's possible that the txt file is saved with an encoding that uses multiple bytes per character, and fscanf does not take it. I would try reading from console first, and copy-paste the lines to see if scanf reads them. – Sergey Kalinichenko Dec 11 '21 at 12:31
  • I'm using the normal windows Editor – mcklmo Dec 11 '21 at 12:33
  • Don't use [f]scanf() for anything non-trivial. – wildplasser Dec 11 '21 at 12:34
  • what else should I used? – mcklmo Dec 11 '21 at 12:37
  • 1
    There could be a Byte Order Mark (BOM) in the input file. Hexdump the file ,and inspect the first three characters. [ answer: you could read entire lines, using fgets(), and parse the lines, **or** do everything character-based (which is hard for floats) ] – wildplasser Dec 11 '21 at 12:38
  • How to do a Hexdump on Windows? Never heard that before – mcklmo Dec 11 '21 at 12:39
  • Windows was never intended as a serious programming environment. Ask Bill G. for some useful tools. – wildplasser Dec 11 '21 at 12:43
  • 1
    The file was actually the issue! I created a new file from scratch and it works. I think it was because the file originated from csv and got saved as a text file. Thanks for your help yall! Have a wonderful day. – mcklmo Dec 11 '21 at 12:47
  • 3
    See also [Why is “while ( !feof (file) )” always wrong?](https://stackoverflow.com/questions/5431941) – Steve Summit Dec 11 '21 at 13:15
  • 1
    @mcklmo *what else should I use?* Despite its many failings, `fscanf` is barely adequate for this task. But when you're ready to move beyond it, see [What can I use for input conversion instead of scanf?](https://stackoverflow.com/questions/58403537) – Steve Summit Dec 11 '21 at 13:24
  • @mcklmo If you do retain `fscanf` in your program, make sure you check the value of `scanres`, and only continue if it's 10, as you expect. – Steve Summit Dec 11 '21 at 15:16

3 Answers3

2

Check if the file has a Byte Order Mark (BOM) in the first three characters. You can use hexdump (or any binary editor) to inspect it.

File with BOM:


00000000  ef bb bf 42 72 65 61 64  2c 32 31 32 2c 32 2e 37  |...Bread,212,2.7|
00000010  2c 33 36 2c 36 2c 39 2e  38 2c 30 2e 30 31 2c 30  |,36,6,9.8,0.01,0|
00000020  2e 30 31 2c 31 30 2c 35  30 30 20 0a 50 61 73 74  |.01,10,500 .Past|
00000030  61 2c 33 34 37 2c 32 2e  35 2c 36 34 2c 31 33 2c  |a,347,2.5,64,13,|
...

File without BOM :


00000000  42 72 65 61 64 2c 32 31  32 2c 32 2e 37 2c 33 36  |Bread,212,2.7,36|
00000010  2c 36 2c 39 2e 38 2c 30  2e 30 31 2c 30 2e 30 31  |,6,9.8,0.01,0.01|
00000020  2c 31 30 2c 35 30 30 20  0a 50 61 73 74 61 2c 33  |,10,500 .Pasta,3|
00000030  34 37 2c 32 2e 35 2c 36  34 2c 31 33 2c 37 2c 30  |47,2.5,64,13,7,0|
...
wildplasser
  • 43,142
  • 8
  • 66
  • 109
  • But why would a BOM cause the code to fail? – Steve Summit Dec 11 '21 at 13:20
  • I don't know. Maybe the `fscanf()` implementation is not 8-bit clean? – wildplasser Dec 11 '21 at 13:27
  • 2
    Actually, I can achieve more or less the same failure by not only giving `foods.txt` a BOM, but encoding it as UTF-16 (little-endian) instead of UTF-8. (I suppose anything's possible, but I have never heard of a "non 8-bit clean" version of `fscanf`!) [Also, thumbs up on choosing that more obvious hexdump format. :-) ] – Steve Summit Dec 11 '21 at 13:42
  • Could be. (I reverse-engineerd the files for the hexdump from the OP's clean ASCII, so it is invalid evidence) – wildplasser Dec 11 '21 at 13:46
  • 1
    At any rate, that was a very nice call on the BOM, and I am going to have to remember this thread. We get questions every day about seemingly fine code that is perplexingly unable to read ordinary text files, and "wrong Unicode encoding" had never been on my menu of possibilities. – Steve Summit Dec 11 '21 at 13:49
  • I've encountered it first at a BOM inside an SQL script, resulting in an error message like :`'SELECT ... : invalid command 'SELECT' at line#1 ...`. Parsers can get very confused ... – wildplasser Dec 11 '21 at 13:57
  • 1
    I've had it worse. I'm guessing mcklmo saved that CSV file from Microsoft Excel. Microsoft loves UTF-16, and they love to use BOM's even in UTF-8 (where they are of course rather comically unnecessary). I discovered the hard way that if you save a TDF file from Excel in UTF-8, and if the first cell is empty, meaning that the first four bytes of the file are a BOM followed by a TAB, and if you cat this file in a MacOS Terminal window — it crashes Terminal! – Steve Summit Dec 11 '21 at 14:02
2

It's likely that, besides having a Byte Order Mark (BOM), the original copy of the foods.txt file was encoded using UTF-16, instead of ASCII or the more popular and compatible UTF-8. Taking a cue from wildplasser's answer, here is a hex dump of the first portion of the file in the little-endian variant of that encoding:

00000000  ff fe 42 00 72 00 65 00  61 00 64 00 2c 00 32 00  |..B.r.e.a.d.,.2.|
00000010  31 00 32 00 2c 00 32 00  2e 00 37 00 2c 00 33 00  |1.2.,.2...7.,.3.|
00000020  36 00 2c 00 36 00 2c 00  39 00 2e 00 38 00 2c 00  |6.,.6.,.9...8.,.|
00000030  30 00 2e 00 30 00 31 00  2c 00 30 00 2e 00 30 00  |0...0.1.,.0...0.|
00000040  31 00 2c 00 31 00 30 00  2c 00 35 00 30 00 30 00  |1.,.1.0.,.5.0.0.|
00000050  20 00 0a 00 50 00 61 00  73 00 74 00 61 00 2c 00  | ...P.a.s.t.a.,.|
00000060  33 00 34 00 37 00 2c 00  32 00 2e 00 35 00 2c 00  |3.4.7.,.2...5.,.|

The leading ff fe represents the byte order mark, and would account for the mysterious that showed up in the output name: ■B. Thereafter, every other byte is 0, which is why "Bread" was truncated to "B". And then fscanf's first %lf sees "r\0e\0a\0d", and can't parse that as a double, which is why fscanf returns 1 instead of 10.

Steve Summit
  • 45,437
  • 7
  • 70
  • 103
  • This also explains why none of `kcal`, `fat`, `omega_3`, `omega_6`, `carb`, `protein`, `fib`, `price`, `weight` have the expected values: `scanf()` stops converting after the first conversion and leaves them unchanged, hence their values are indeterminate as the `Shelf` array is uninitialized. – chqrlie Dec 11 '21 at 19:44
0

copying the contents of the .txt file into a new .txt file solved the problem. It was originated in an .xls file, my guess is, there the weird BOM stuff, mentioned by some of you, comes from.

mcklmo
  • 69
  • 7