-1

I have seen other answers on this matter, but all of them deal with a std::stringstream, or a temporary char or std::string array, various other sorts of external libraries, but I would like to try using only the fstream header, to try to read a file that has only numbers, both char and short, together with float, separated by commas, forming more than one lines of text; some may be arrays, or vectors. Example:

1,1.1,11.1,11
2,2.2,22.2,22
3,3.3,33.3,33
...

The order is known, since each line follows the variables from a struct. The number of lines may vary, but, for now, let's assume it is also known. Also for the sake of example, let's only consider this order, and these types:

int, double, double, int

Going along with a piece of code I have seen, I tried this simplistic (and, most probably, naive) approach:

int a, d;
double b, c;
char fileName {"file.txt"};
std::fstream fs {fileName};
if(!fs.is_open())
    // open with fs.out, write some defaults; this works, no need to mention
else
{
    char comma;
    while(fs.getline(fileName, 100, '\n'))
    {
        fs >> a >> comma >> b >> comma >> c >> comma >> d;
        std::cout << 2*a << ", " << 2*b << ", " << 2*c << ", " << 2*d << '\n';
    }
}

If the file has the three lines above, plus a terminating \n, it outputs this:

4, 4.4, 44.4, 44
6, 6.6, 66.6, 66
6, 6.6, 66.6, 66
*** stack smashing detected ***: <unknown> terminated
Aborted (core dumped)

If I add a \n at the beginning of the file, it outputs:

2, 2.2, 22.2, 22
4, 4.4, 44.4, 44
6, 6.6, 66.6, 66
6, 6.6, 66.6, 66

If I remove the last \n, it works as intended. I have a few questions:

  1. What else can I do when writing the file besides adding a beginning \n and not inserting a terminating one in order to work as intended?

  2. If the number of variables is longer, say 100 per line, what can I do to avoid going 'round the Earth with fs >> a >> c >> ...?

  3. If I only need to read a specific line, or only a few, one method would probably be counting the occurences of \n, or the lines, somehow. How could I do that?

(edit)

  1. Not lastly, as the title mentions, is it possible to do it without involving other headers, only with fstream (as it currently is, for example)?
a concerned citizen
  • 787
  • 2
  • 9
  • 25
  • You might want to go through your code again. It doesn't really make sense. – eesiraed Apr 08 '18 at 18:26
  • @FeiXiang Thank you for letting me know. I see I reversed two `<<` in the `fs >> ...` line. Is there something else I am missing? (to not edit too many times) – a concerned citizen Apr 08 '18 at 18:29
  • There was some post which changed locale. Search for it – Incomputable Apr 08 '18 at 18:30
  • @Incomputable I am not sure I follow. I tried now searching for "change locale" and I only get posts relevant to the localization settings. If it's a c++ domain, could you please specify? I am not advanced. – a concerned citizen Apr 08 '18 at 18:32
  • 1
    parsing csv file c++. Use that search string, you should stumble upon an answer which has csv_classification struct. – Incomputable Apr 08 '18 at 18:33
  • I'm sure `while(fs.getline(fileName, 100, '\n'))` isn't what you want. You're repeatedly trying to read a file name from the file into a char array which is likely too small. And your char array isn't an array, but a single char. Why do you even need the char array? Just directly give the file name to the `fstream` constructor. Since you're only doing input, you should use `ifstream` instead. Also, the extraction operator can only read with whitespace as a delimiter. You're going to have to use `getline()` first and then parse it later. – eesiraed Apr 08 '18 at 18:34
  • @FeiXiang I'm afraid that's not it. The char is only there to catch the commas (hence the name). I am using `getline()` to catch the lines of 100 chars, more than enough for this example, and stop at the terminating `\n`. While doing so, it captures the contents of the line (the `fs >> a >> ...` line) directly into the variables defined outside the `while`. As I mentioned, the code works, but not as I expect. Also, the first part of the `if()` says I am using `fs.out` to write defaults. It may be brute and naive, I am admitting, and that's why I am asking for help. – a concerned citizen Apr 08 '18 at 18:37
  • 1
    You are trying to extract a line and put it into the char array called `fileName`, basically discarding a line since you never read from `fileName`. – eesiraed Apr 08 '18 at 18:41
  • @FeiXiang Ah, I see the mistake now. I thought `fileName` is used as the opening file. Is there any way I can avoid using a temporary `char[]`, or `std::string` in there? I'm not fixed on `getline()`. – a concerned citizen Apr 08 '18 at 18:43
  • There is no way that I know of. Why are you so afraid of it anyway? A few copies won't hurt. – eesiraed Apr 08 '18 at 18:44
  • @FeiXiang It's not only the copies, looking at other examples, all of them use some sort of `char` to `int`, or `double`, etc. The way I did it seemed to work, even as wrong as it were. So, I had hopes... – a concerned citizen Apr 08 '18 at 18:51
  • @Incomputable It looks like this is the link you were talking about: https://stackoverflow.com/questions/25224688/csv-parsing-with-c/25225612 . It doesn't answer my question, but it looks like the answer is no. But it does help for the other 3 questions. Maybe together with the other answers I have found, I can concoct some usable code. – a concerned citizen Apr 08 '18 at 19:02
  • If you know the number and order of the fields, just use a `switch(field_no) { case 0: f >> a >> comma; if (f.eof()) goto done; break;...}`. – David C. Rankin Apr 08 '18 at 19:26
  • @aconcernedcitizen - Do you have any particular *reason* for not using the standard methods [How can I read and parse CSV files in C++?](https://stackoverflow.com/questions/1120140/how-can-i-read-and-parse-csv-files-in-c). If you are allergic to stringstreams, you can of course use a temporary disk file instead, but why would anyone want to do that? – Bo Persson Apr 08 '18 at 19:28
  • @BoPersson One of the answers I have read (among them your link) said that using `stringstream` can slow down the reading by almost 3 times, I can't find the post, but the example with `fs >> a >> ...` comes from there. So I thought, the simpler, the more efficient. Still, suppose the question 4 is answered (indirectly, 1, too), and it's a no. Because of that I am willing to do it in any of the many ways others have done it, so the other 2 questions remain. I am trying to read and adapt, but I am not a programmer, so I will make mistakes like this one, for example. – a concerned citizen Apr 08 '18 at 19:53
  • @DavidC.Rankin Suppose the number of lines is 100, would I have to use a one-hundred-liner `switch`? What if the number of lines varies? – a concerned citizen Apr 08 '18 at 19:54
  • @BoPersson I found the link: https://stackoverflow.com/questions/21778571/fastest-way-to-get-data-from-a-csv-in-c . He uses an intermediary `string`, others a `char` array, others `stringstream`, who knows how many ways are there. Given that most try to find a faster way, I thought maybe I can cut a corner or two. – a concerned citizen Apr 08 '18 at 20:14
  • @aconcernedcitizen - no, the switch is for the known number of fields... I'll drop a short example with your data. – David C. Rankin Apr 09 '18 at 02:50

1 Answers1

4

The order is known, since each line follows the variables from a struct. The number of lines may vary, but, for now, let's assume it is also known. Also for the sake of example, let's only consider this order, and these types:

int, double, double, int

If the number and order of the fields is known, then you can simply read with >> or getline using both the ',' or '\n' delimiter as required. While it is much wiser to use line-oriented input to read an entire line and then stringstream to parse the fields, there is no reason you can't do the same thing utilizing only fstream as you have indicated is your goal. It's not as elegant of a solution, but a valid one nonetheless.

Using the >> Operator

Your data has 4-fields, the first 3 are delimited by a comma, the final delimited by the newline. You can simply loop continually and read using the >> operator and test for fail() or eof() after each read, e.g.

#include <iostream>
#include <fstream>

#define NFIELD 4
#define MAXW 128

int main (int argc, char **argv) {

    int a, d;
    double b, c;
    char comma;

    std::fstream f (argv[1]);
    if (!f.is_open()) {
        std::cerr << "error: file open failed " << argv[1] << ".\n";
        return 1;
    }

    for (;;) {          /* loop continually */
        f >> a >> comma >> b >> comma >> c >> comma >> d;
        if (f.fail() || f.eof())   
            break;
        std::cout << 2*a << "," << 2*b << "," << 2*c << "," << 2*d << '\n';
        f.ignore (MAXW, '\n');
    }
    f.close();
}

Keeping a simple field counter n, you can use a simple switch statement based on the field number to read the correct value into the corresponding variable, and when all fields are read output (or otherwise store) all 4 values that make up your struct. (obviously you can fill each member at the time they are read as well). Nothing special is required, e.g.

#include <iostream>
#include <fstream>

#define NFIELD 4

int main (int argc, char **argv) {

    int a, d, n = 0;
    double b, c;
    char comma;

    std::fstream f (argv[1]);
    if (!f.is_open()) {
        std::cerr << "error: file open failed " << argv[1] << ".\n";
        return 1;
    }

    for (;;) {          /* loop continually */
        switch (n) {    /* coordinate read based on field number */
            case 0: f >> a >> comma; if (f.eof()) goto done; break;
            case 1: f >> b >> comma; if (f.eof()) goto done; break;
            case 2: f >> c >> comma; if (f.eof()) goto done; break;
            case 3: f >> d; if (f.eof()) goto done; break;
        }
        if (++n == NFIELD) {    /* if all fields read */
            std::cout << 2*a << "," << 2*b << "," << 2*c << "," << 2*d << '\n';
            n = 0;      /* reset field number */
        }
    }
    done:;
    f.close();
}

Example Input File

Using your provided sample input.

$ cat dat/mixed.csv
1,1.1,11.1,11
2,2.2,22.2,22
3,3.3,33.3,33

Example Use/Output

You obtain your desired output by simply doubling each field on output:

$ ./bin/csv_mixed_read dat/mixed.csv
2,2.2,22.2,22
4,4.4,44.4,44
6,6.6,66.6,66

(the output for both above is the same)

Using getline Delimited by ',' and '\n'

You can use a slight variation on the logic to employ getline. Here, you read the first 3 fields with f.getline(buf, MAXC, ','), and when the 3rd field is found, you read the final field with f.getline(buf, MAXC). For example,

#include <iostream>
#include <fstream>

#define NFIELD  4
#define MAXC  128

int main (int argc, char **argv) {

    int a = 0, d = 0, n = 0;
    double b = 0.0, c = 0.0;
    char buf[MAXC];

    std::fstream f (argv[1]);
    if (!f.is_open()) {
        std::cerr << "error: file open failed " << argv[1] << ".\n";
        return 1;
    }

    while (f.getline(buf, MAXC, ',')) { /* read each field */
        switch (n) {    /* coordinate read based on field number */
            case 0: a = std::stoi (buf); break;
            case 1: b = std::stod (buf); break;
            case 2: c = std::stod (buf); 
                if (!f.getline(buf, MAXC))  /* read d with '\n' delimiter */
                    goto done;
                d = std::stoi (buf);
                break;
        }
        if (++n == NFIELD - 1) {    /* if all fields read */
            std::cout << 2*a << "," << 2*b << "," << 2*c << "," << 2*d << '\n';
            n = 0;      /* reset field number */
        }
    }
    done:;
    f.close();
}

(note: unlike using the >> operator, when using getline as above, there can be no whitespace following each comma.)

Example Use/Output

The output is the same.

$ ./bin/csv_mixed_read2 dat/mixed.csv
2,2.2,22.2,22
4,4.4,44.4,44
6,6.6,66.6,66

Regardless whether you use something like the examples above, or stringstream, you will have to know the number and order of the fields. Whether you use a loop and if..else if..else or switch the logic is the same. You need some way of coordinating your read with the correct field. Keeping a simple field counter is about as simple as anything else. Look things over and let me know if you have further questions.

David C. Rankin
  • 81,885
  • 6
  • 58
  • 85
  • My purpose is to generate a save file for the `struct` holding the input data and various other settings. There are different topics, each requiring the same `struct`. I can reinforce a rule of writing, such as csv, I can determine how many lines to save (nr. of topics), even though it will fill up in time as this will be refreshed after each use. That's why I said it will have a known number of lines and (ordered) values. I find the 1st example very appealing. I don't see on cppreference, but is there an `eol()`, too? Or maybe a way to count the `\n` (I'll probably have to use a `char`)? – a concerned citizen Apr 09 '18 at 05:42
  • There is no `eol()`, but `f.getline (NCHARS, '\n')` allows for the same test. You can always add a simple `int n = 0;` to the first example and just increment it at the end of each loop. That would count the lines for you. Give it a go. If you have any problems, drop another comment and I'm happy to help. – David C. Rankin Apr 09 '18 at 06:00
  • Most probably I'll just use all the lines, and fill the unused ones with default values. For finding a certain line I thought of having a string with `{"one", "two", ...}` at the beginning of each line, but I think that will be cumbersome to have to check against strings. Or, looking at cppreference fstream.ignore(), I see I can use that as a way to determine the occurences of `\n`, and make a separate function with a loop. Maybe not beautoful, but working (for now). (Just saw your comment) Thanky you for the help. I'll mark this as the answer, but I may return with comments. :-) – a concerned citizen Apr 09 '18 at 06:01
  • Sure, that works, but anytime you have the choice between using a simple counter -- or using some other function -- take the counter route and avoid the overhead of a separate function call. (it's minimal, but can add up over a large application...) Also, instead of some number `NCHARS` in `ignore`, the proper constant is to include `` and use `std::numeric_limits::max()` (which is `INT_MAX`) for the number of characters to ignore. – David C. Rankin Apr 09 '18 at 06:02
  • Then I think a not very bad approach would be to combine them. I can count chars until `\n`, that will be my `MAXC`, use it to `getline()` and `>>` directly in the variables. If I need to only perform actions on certain lines, I can skip to a different `\n` based on an `enum`, or even a string array. Maybe I can even do it without the help of an intermediate `char` array, or `stringstream`, as in your examples. So I can use a loop and avoid `switch`. It starts looking better, and there I thought I would be flayed alive for daring ask such a question, against "the books", by the looks of it. – a concerned citizen Apr 09 '18 at 06:18
  • Well, there is educational value in figuring out all the different ways you can "skin-a-cat". From it comes an understanding of which coding tool to pickup to handle which coding problem. This is the "nuts-and-bolts" of learning. It's not the most elegant part of coding -- but it darn sure is the most important. A house is only as good as the foundation it is built on... – David C. Rankin Apr 09 '18 at 06:38