5

I have tried some fixes mentioned in other answers but they had no effect on my output. I was not planning on using boost spirit as I am not sure it is the best option for my needs. Also the similar post does not deal with quoted material which contains commas, which is my last issue to resolve at this point.

This is a C++ program. I am using a CSV file as input. This file gives features of seals, there are 23 values(columns) per entry. When I output rawdata[22] I expect to see the last entry of the first set of data. Instead, I see the last entry (Petitioned) followed by the first entry (2055) of the next seal. When I open this in a hex editor I see the two words are separated by a "." and the hex character is 0a. I have tried setting \r, \n, \r\n, as delimiters but they do not work. I cannot use "," as a delimiter because it is used within strings, I tested it to see if it would work for my issue anyway and it didn't. How to separate these values?

OUTPUT:

Petitioned 2055

SAMPLE INPUT:

SpeciesID,Kingdom,Phylum,Class,Order,Family,Genus,Species,Authority,Infraspecific rank,Infraspecific name,Infraspecific authority,Stock/subpopulation,Synonyms,Common names (Eng),Common names (Fre),Common names (Spa),Red List status,Red List criteria,Red List criteria version,Year assessed,Population trend,Petitioned
2055,ANIMALIA,CHORDATA,MAMMALIA,CARNIVORA,OTARIIDAE,Arctocephalus,australis,"(Zimmermann, 1783)",,,,,Arctophoca australis,South American Fur Seal,Otarie fourrure Australe,Oso Marino Austral,LC,,3.1,2016,increasing,N
41664,ANIMALIA,CHORDATA,MAMMALIA,CARNIVORA,OTARIIDAE,Arctocephalus,forsteri,"(Lesson, 1828)",,,,,Arctocephalus australis subspecies forsteri|Arctophoca australis subspecies forsteri,"New Zealand Fur Seal, Antipodean Fur Seal, Australasian Fur Seal, Black Fur Seal, Long-nosed Fur Seal, South Australian Fur Seal",,,LC,,3.1,2015,increasing,N

my code:

#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
using namespace std;

int main() {
    string line;
    vector<string> rawdata;
    ifstream file ( "/Users/darla/Desktop/Programs/seals.csv" );
    if ( file.good() )
   {
    while(getline(file, line, '"')) {
        stringstream ss(line);
        while (getline(ss, line, ',')) {
            rawdata.push_back(line);
        }
        if (getline(file, line, '"')) {
            rawdata.push_back(line);
        }
    }
   }
    cout << rawdata[22] << endl;


    return 0;
Mr Berry
  • 115
  • 10
  • 3
    Why you are splitting on quotes has me puzzled . Based on what you are trying to do, I would expect `getline` without the specified delimiter to allow the system to figure out the line ending for your platform and then get a complete line before splitting the line on commas. – user4581301 Jan 03 '18 at 22:11
  • The output is spot on, `getline(ss, line, ',')` splits the input by `,` and `Petitioned\n2055` is exactly index 22. Change `while(getline(file, line, '"')` to `while(getline(file, line)` and see what happens – Killzone Kid Jan 03 '18 at 22:14
  • Possible duplicate of [How can I read and parse CSV files in C++?](https://stackoverflow.com/questions/1120140/how-can-i-read-and-parse-csv-files-in-c) – Barmar Jan 03 '18 at 22:16
  • Removing the '"' from the first while statement works to separate the last entry (Petitioned) from the first of the next set (2055), which is good. However doing this causes my data to no longer be separated by comma. – Mr Berry Jan 03 '18 at 22:18
  • rawdata[23] = `2055,ANIMALIA,CHORDATA,MAMMALIA,CARNIVORA,OTARIIDAE,Arctocephalus,australis,` – Mr Berry Jan 03 '18 at 22:19
  • that is because of your second `if (getline(file, line, '"')) {`. remove it completely – Killzone Kid Jan 03 '18 at 22:19
  • and thanks Barmar, I did read that thread earlier today – Mr Berry Jan 03 '18 at 22:19
  • That almost works but then it does not register the quotation marks surrounding entries with a comma inside the string – Mr Berry Jan 03 '18 at 22:22
  • do you know the encoding of your CSV-file ? what is the new line delimiter you are using when exporting the CSV ? – StPiere Jan 03 '18 at 22:23
  • Recommendation: Get a full line and then call a function that accepts the line as a `string` and returns a `vector` containing the split up line splits up the line. This function needs to be quote aware, and I'm not sure if a state machine or brute force with a `stringstream` (the way you are trying to parse) will suit you better. – user4581301 Jan 03 '18 at 22:25
  • I don't know the encoding I downloaded the CSV from IUCN (Animal database). After this test file I want to run large sections of the database through this parser. I'll test out that last idea. – Mr Berry Jan 03 '18 at 22:30
  • This doesn't address the question, but that `if (file.good())` isn't needed. `std::getline` will simply fail if the stream state is not good. – Pete Becker Jan 03 '18 at 22:45
  • Thank you, I removed it. Still working on preserving quoted entries. – Mr Berry Jan 03 '18 at 23:04
  • You have an answer, but something I'd just like to point out. You did a decent job of getting your code near minimal for the MCVE. But bear in mind the concept applies equally to sample data. E.g. you should have been able to provide actual output and expected output for sample data along the following lines: `Col1,Col 2,Col3\nVal,"Quoted with, comma",With spaces no quotes` .... The skill of using simpler data to demonstrate a problem makes it much easier to test and debug. – Disillusioned Jan 04 '18 at 14:33

1 Answers1

3

This is far from a complete CSV parser and could be made more efficient, but it does the job, parses your file correctly and deals with double quotes as well.

#include <iostream>
#include <sstream>
#include <fstream>
#include <string>
#include <vector>
#include <algorithm>

int main()
{
    std::string line;
    std::vector<std::vector<std::string>> lines;
    std::ifstream file("/Users/darla/Desktop/Programs/seals.csv");

    if (file)
    {
        while (std::getline(file, line))
        {
            size_t n = lines.size();
            lines.resize(n + 1);

            std::istringstream ss(line);
            std::string field, push_field("");
            bool no_quotes = true;

            while (std::getline(ss, field, ',')) 
            {
                if (static_cast<size_t>(std::count(field.begin(), field.end(), '"')) % 2 != 0)
                {
                    no_quotes = !no_quotes;
                }

                push_field += field + (no_quotes ? "" : ",");

                if (no_quotes)
                {
                    lines[n].push_back(push_field);
                    push_field.clear();
                }
            }
        }
    }

    for (auto line : lines)
    {
        for (auto field : line)
        {
            std::cout << "| " << field << " |";
        }

        std::cout << std::endl << std::endl;
    }

    return 0;
}

enter image description here

An explanation. The program reads file lines and tries to parse each line by fields, separated by commas, then stores the results in vector of vectors. If a field with double quotes encountered and double quotes are at odd number, this means it is an open field so more fields are read in until closing field is found, then the complete filed is stored. If field contains even number of double quotes or none, it is stored straight away. Hope this helps.

Killzone Kid
  • 6,171
  • 3
  • 17
  • 37
  • This one works well and actually takes care of a small error the last answer didn't account for (it cut off the last field of every set - the Petitioned value - which isn't important to me but still). I am playing with this solution as well, thanks a lot. – Mr Berry Jan 03 '18 at 23:49
  • @MrBerry I have just updated the code, because I completely forgot that `getline` would eat `,` Now it should be ok. – Killzone Kid Jan 04 '18 at 00:01
  • 1
    Great, I'm using this one as I'll eventually have to run larger databases through it and it preserves all of the values. – Mr Berry Jan 04 '18 at 00:43
  • It does in fact eat the commas, I'm working on fixing that now. – Mr Berry Jan 04 '18 at 05:04
  • @MrBerry have you tried the updated code or there is some other issue? – Killzone Kid Jan 04 '18 at 05:06