1

I'm trying to parse a CSV file and getline() is reading the entire file as one line. On the assumption that getline() wasn't getting what it expected, I tried \r, \n, \n\r, \r\n, and \0 as arguments with no luck.

I took a look at the EOL characters and an seeing CR and then LF. Is getline() just ignoring this or am I missing something? Also, what's the fix here?

The goal of this function is a general purpose CSV parsing function that stores the data as a 2d vector of strings. Although advice on that front is welcome, I'm only looking for a way to fix this issue.

vector<vector<string>> Parse::parseCSV(string file)
{
    // input fstream instance
    ifstream inFile;
    inFile.open(file);

    // check for error
    if (inFile.fail()) { cerr << "Cannot open file" << endl; exit(1); }

    vector<vector<string>> data;
    string line;

    while (getline(inFile, line))
    {
        stringstream inputLine(line);
        char delimeter = ',';
        string word;
        vector<string> brokenLine;
        while (getline(inputLine, word, delimeter)) {
            word.erase(remove(word.begin(), word.end(), ' '), word.end());      // remove all white spaces
            brokenLine.push_back(word);
        }
        data.push_back(brokenLine);
    }

    inFile.close();

    return data;

};

Here's the hexdump. I'm not sure what exactly this is showing.

0000000 55 4e 49 58 20 54 49 4d 45 2c 54 49 4d 45 2c 4c
0000010 41 54 2c 4c 4f 4e 47 2c 41 4c 54 2c 44 49 53 54
0000020 2c 48 52 2c 43 41 44 2c 54 45 4d 50 2c 50 4f 57
0000030 45 52 0d 31 34 32 34 31 30 35 38 30 38 2c 32 30
0000040 31 35 2d 30 32 2d 31 36 54 31 36 3a 35 36 3a 34
0000050 38 5a 2c 34 33 2e 38 39 36 34 2c 31 30 2e 32 32
0000060 34 34 34 2c 30 2e 38 37 2c 30 2c 30 2c 30 2c 4e
0000070 6f 20 44 61 74 61 2c 4e 6f 20 44 61 74 61 0d 31
0000080 34 32 34 31 30 35 38 38 35 2c 32 30 31 35 2d 30
0000090 32 2d 31 36 54 31 36 3a 35 38 3a 30 35 5a 2c 34
00000a0 33 2e 39 30 31 33 35 2c 31 30 2e 32 32 30 34 31
00000b0 2c 31 2e 30 32 2c 30 2e 36 33 39 2c 30 2c 30 2c
00000c0 4e 6f 20 44 61 74 61 2c 4e 6f 20 44 61 74 61 0d
00000d0 31 34 32 34 31 30 35 38 38 38 2c 32 30 31 35 2d
00000e0 30 32 2d 31 36 54 31 36 3a 35 38 3a 30 38 5a 2c
00000f0 34 33 2e 39 30 31 34 38 2c 31 30 2e 32 32 30 31
0000100

The first two lines of the file

UNIX TIME,TIME,LAT,LONG,ALT,DIST,HR,CAD,TEMP,POWER
1424105808,2015-02-16T16:56:48Z,43.8964,10.22444,0.87,0,0,0,No Data,No Data

UPDATE Looks like it was \r. Im not sure why it didn't work earlier, but I learned a few things while exploring. Thanks for the help guys.

Will Luce
  • 1,781
  • 3
  • 20
  • 33
  • Do *any* of the answers to this question, ["How can I read and parse CSV files in C++?"](http://stackoverflow.com/questions/1120140/how-can-i-read-and-parse-csv-files-in-c), help at all ? – WhozCraig Mar 01 '15 at 05:05
  • I've read through them, and although they're addressing what I'm talking about, I'm not grasping what to DO about it. – Will Luce Mar 01 '15 at 05:11
  • Assuming your file is as simple as described. your code looks like it should be correct. So in summary you're saying that `while (getline(inFile, line)) ` is hitting *once* and slurping *everything* ? What platform is this running on? – WhozCraig Mar 01 '15 at 05:23
  • I'm on a Mac working in Xcode. The function runs, but never breaks lines and loads the entire file into one line. The file is in fact >700 lines. – Will Luce Mar 01 '15 at 05:26
  • Was the file created on your Mac as well ? I'm truly curious what a char-by-char walk of that file would look like, because `std::getline` should pull that file per-line correctly unless you have wonky line endings. can you update your question to include a `hexdump filename` of the first 200 or so chars (anything that has what is supposed to be line endings) ? – WhozCraig Mar 01 '15 at 05:31
  • No, it was converted by a third party software. I'm going to add a picture of what I got when I broke it down. – Will Luce Mar 01 '15 at 05:33
  • 1
    Please no pictures if possible, and *especially* if it can be demonstrated with a hexdump as text. Just open a console and `hexdump -n 256 filename` pasted in the question as a source list would probably be good enough, assuming the first line is not longer than 256 bytes. Will look [something like this](http://pastebin.com/05vauHkK) and be a nice addition to your question. The actual text of the first couple of lines to accompany will be nice too. – WhozCraig Mar 01 '15 at 05:36
  • 2
    Thanks, finally, include the first two lines of actual text from the file. It seems very odd that while-loop hits *once*. You've confirmed `data` has *one* entry, right? (I know, seems a redundant question, but gotta ask). From the looks of that dump the separator is `0x0D`, or `'\r'` *only*. And you say you tried changing the outer `getline` to be `std::getline(inFile, line, '\r')` ? – WhozCraig Mar 01 '15 at 05:45
  • There you go. Data is only the return of the function. It isn't mentioned in the file. – Will Luce Mar 01 '15 at 05:48

2 Answers2

1

A simple fix would be to write your own getline
For example one that ignores any combination of \n,\r
in the beginning of the line, and breaking on any too.
That will work on any platform, but wont preserve empty lines.

After looking at the hex-dump, the delimiter is 0d (\r)

sp2danny
  • 7,488
  • 3
  • 31
  • 53
-1

Did you try to switch the order of the \r\n to \n\r?

MZaragoza
  • 10,108
  • 9
  • 71
  • 116
Shay Nehmad
  • 1,103
  • 1
  • 12
  • 25
  • Yes, I've switched them. It turns out that getline() only accepts one character as a delimiter. So, neither of them are valid and both cause an error. – Will Luce Mar 01 '15 at 05:30
  • Getline has a delimiter option, right? Maybe use that? – Shay Nehmad Mar 01 '15 at 05:32
  • The delimiter option only takes one character. – Will Luce Mar 01 '15 at 05:34
  • It doesn't. The question lists which combinations I've tried. – Will Luce Mar 01 '15 at 05:37
  • `\n\r` is not a line terminator on any computer system ever built. The suggestion is futile. See [here](http://stackoverflow.com/questions/3307747/difference-between-newline-and-carriage-return-r/3315133#3315133) for some background. – user207421 Apr 20 '15 at 01:32