0

I'm very surprised by this behaviour. I have to be doing something wrong, but I can't find out what it is.

I had a 133*21 table in a .xml file, and converted it to .csv. I didn't lose any info in this excel conversion.

Then, I made a simple program that reads that table to different structs:

typedef struct{
    float xval;
    float yval;
    float zval;
} tTuple_float;

typedef struct{
    int A;
    int B;
    int C;
} tTuple_int;

The program is this:

#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#define MAXCHAR 1000

int main(void) {
    tTuple_float C1[133], C2[133], C3[133], C4[133], C5[133], C6[133];
    tTuple_int ref[133];
    FILE *fp;
    int i=0;
    char row[MAXCHAR];
    fp = fopen("filename.csv","r");
    if (fp==NULL){
        printf("Error opening file\n");
        return 1;
    }
    i=0;
    setbuf(stdout, NULL);
    while (i<133){
        fgets(row, MAXCHAR, fp);
        printf("%s", row);         //To compare with the row printed with the arrays
        sscanf(row, "%d;%d;%d;%f;%f;%f;%f;%f;%f;%f;%f;%f;%f;%f;%f;%f;%f;%f;%f;%f;%f",
            &ref[i].A, &ref[i].B, &ref[i].C,
            &C1[i].xval, &C1[i].yval, &C1[i].zval,
            &C2[i].xval, &C2[i].yval, &C2[i].zval,
            &C3[i].xval, &C3[i].yval, &C3[i].zval,
            &C4[i].xval, &C4[i].yval, &C4[i].zval,
            &C5[i].xval, &C5[i].yval, &C5[i].zval,
            &C6[i].xval, &C6[i].yval, &C6[i].zval);
        printf("%d;%d;%d;%f;%f;%f;%f;%f;%f;%f;%f;%f;%f;%f;%f;%f;%f;%f;%f;%f;%f\n",
            ref[i].A, ref[i].B, ref[i].C,
            C1[i].xval, C1[i].yval, C1[i].zval,
            C2[i].xval, C2[i].yval, C2[i].zval,
            C3[i].xval, C3[i].yval, C3[i].zval,
            C4[i].xval, C4[i].yval, C4[i].zval,
            C5[i].xval, C5[i].yval, C5[i].zval,
            C6[i].xval, C6[i].yval, C6[i].zval);
        i++;
        setbuf(stdout, NULL);
    }
    fclose(fp);
    return 0;
}

I added that printf("%s", row); to compare the string I was getting from fgets() with the values I was saving using sscanf().

Looking at the two first rows: 1st and 2nd row

We can see that:

  1. the sscanf() doesn't work at all in the 1st;
  2. in the second row, the first float 715.973 is converted to 715.973022, instead of 715.973000;
  3. in the second row, the fifth float 619.22 is converted to 619.219971, instead of 619.220000;

So, in some cases it's adding decimals, in other cases it's subtracting decimals. After some digging in Stack Overflow, I understood that floats are inaccurate but, what I don't know is: how can I work around this? Is there any way to truncate the float or what's the best way to round it up to 3 decimals?

Other than that, any off-topic improvement to the code itself is more than welcome.

EDIT: Providing minimal workable example (MWE) as follows

#include <stdio.h>
#include <stdlib.h>
#define MAXCHAR 1000

typedef struct{
    float xval;
    float yval;
    float zval;
} tTuple_float;

typedef struct{
    int A;
    int B;
    int C;
} tTuple_int;

int main(void) {
    tTuple_float C1[3];
    tTuple_int  ref[3];
    FILE *fp;
    int i=0;
    char row[MAXCHAR];
    setbuf(stdout, NULL);
    fp = fopen("three_row.csv","r");
    if (fp==NULL){
        printf("Error opening file\n");
        return 1;
    }
    i=0;
    while (i<3){
        fgets(row, MAXCHAR, fp);
        printf("%s", row);
        sscanf(row, "%d;%d;%d;%f;%f;%f;",&ref[i].A, &ref[i].B, &ref[i].C,
                &C1[i].xval, &C1[i].yval, &C1[i].zval);
        printf("%d;%d;%d;%f;%f;%f\n",ref[i].A, ref[i].B, ref[i].C,
                C1[i].xval, C1[i].yval, C1[i].zval);
        i++;
        setbuf(stdout, NULL);
    }
    fclose(fp);
    return 0;
}

And the three_row.csv:

1;2;3;111.111;222.222;333.333
4;5;6;444.444;555.555;666.666
7;8;9;777.777;888.888;999.999

My console output when I run the MWE:

1;2;3;111.111;222.222;333.333
524294;0;-13376;0.000000;0.000000;0.000000
4;5;6;444.444;555.555;666.666
4;5;6;444.444000;555.554993;666.666016
7;8;9;777.777;888.888;999.999
7;8;9;777.776978;888.888000;999.999023
  • 1
    [Obligatory reading on feof](https://stackoverflow.com/questions/5431941/why-is-while-feoffile-always-wrong). – n. m. could be an AI Feb 27 '23 at 15:28
  • 2
    Edit the question to provide a [mre], including text that reproduces the problem. Do not post text as images; post direct text that other people can copy and use as input when reproducing the problem. – Eric Postpischil Feb 27 '23 at 15:34
  • 4
    In addition, `!= true` and `== true` are nearly always wrong in C too (on top of being un-idiomatic and redundant). Nobody has promised that `feof()` ever returns `true`. To check that a condition holds, use `if(condition)`, *never* `if(condition==true)` or anything similar. To check the inverse, use `if(!condition)`. – n. m. could be an AI Feb 27 '23 at 15:36
  • 1
    In the format commonly used for `float`, IEEE-754 binary32, the representable value nearest 715.973 is 715.9730224609375. There is no way to make `scanf` assign any value to a `float` closer than this. If you want to work with values closer to 715.973, you must use `double` or other formats (which may involve writing your own routines to support them). – Eric Postpischil Feb 27 '23 at 15:37
  • 1
    *"the sscanf() doesn't work at all."* ----> Not helpful at all. How does it not work? Why do you not check its return value? – Harith Feb 27 '23 at 15:37
  • @n.m. thanks for the tips. In this case, the loop condition can be changed into `i<133` (or a `for` loop) since I know the exact number of rows to read – ATSlooking4things Feb 27 '23 at 15:50
  • 2
    The problem with the first line could be the BOM. Check the encoding with e.g. Notepad++, and *check the return value of `sscanf`*. – n. m. could be an AI Feb 27 '23 at 15:57
  • 2
    "*Is there any way to truncate the float*" --> you misunderstand the nature of the issue, which is that the `float` representation *is* truncated relative to the (infinite) binary representation that would be required to exactly represent the number whose decimal representation is, say, 715.973. You can round to fewer decimal digits for display, but you cannot get a `float` any closer to the target number. – John Bollinger Feb 27 '23 at 15:58
  • Is there any possibility that there's a BOM (byte order mark) at the start of the first line? Maybe encoded as UTF-8? It would account for why the first line goes awry. If you checked the return value from `sscanf()`, you'd know because it returned 0 instead of 21. – Jonathan Leffler Feb 27 '23 at 16:03
  • @n.m I edited with Notepadd++ and it does say UTF-8 BOM in bottom right. I'll try to find what does that mean – ATSlooking4things Feb 27 '23 at 16:10
  • I do not reproduce the behavior you describe for the first-row results of your simplified code with the provided example data. The reported behavior seems to show the first `sscanf()` failing to scan any fields, which you should check by examining the return value. Most likely, there is leading non-printing data in your CSV that `sscanf()` does not take for whitespace. A UTF-8 BOM would be an excellent candidate for that. – John Bollinger Feb 27 '23 at 16:17
  • To get rid of the BOM, select UTF-8 (no BOM) in the Encoding menu of Notepad++ and save. – n. m. could be an AI Feb 27 '23 at 16:20
  • @JohnBollinger yes, you are all correct. And I got to know that to fix this I just had to open Notepad++ change incoding to UTF-8 instead of UTF-8-BOM and save it. It's working now. Thanks to everyone – ATSlooking4things Feb 27 '23 at 16:21

1 Answers1

0

Like many have pointed in the comment section, the issue was the encryption type of my .csv. UTF-8-BOM has a BOM (byte order marker) that was being read by sscanf(), and messing up the whole first row i.e. first element of all the arrays.

To solve it, I just had to open the .csv in Notepad++, change encryption to UTF-8 (no BOM) and save it.

The second problem, regarding float inaccuracy, in John Bollinger words: You can round to fewer decimal digits for display, but you cannot get a float any closer to the target number.