How can I read and manipulate CSV file data in C++?

Question

Pretty self-explanatory, I tried google and got a lot of the dreaded expertsexchange, I searched here as well to no avail. An online tutorial or example would be best. Thanks guys.

I wrote libcsv a while ago, it is a small and very fast CSV parser in C and can be used from C++ as well. The download contains documentation and sample programs. You can check it out at http://sourceforge.net/projects/libcsv/. — Robert Gamble, Jan 06 '09 at 05:29
How do you expect to get more rep if you give answers as comments, Robert? :D — Jonathan Leffler, Jan 06 '09 at 05:52
@ZamfirKerlukson this was asked about 6 months before that question. — zkwentz, Feb 23 '13 at 15:19
@Shog9 in what way is this duplicate? This was asked before that other question. — zkwentz, Aug 25 '17 at 22:37
They're asking the same question; I just picked the one with more activity. — Shog9, Aug 25 '17 at 22:41

score 60 · Answer 1 · edited May 23 '17 at 12:10

60

More information would be useful.

But the simplest form:

#include <iostream>
#include <sstream>
#include <fstream>
#include <string>

int main()
{
    std::ifstream  data("plop.csv");

    std::string line;
    while(std::getline(data,line))
    {
        std::stringstream  lineStream(line);
        std::string        cell;
        while(std::getline(lineStream,cell,','))
        {
            // You have a cell!!!!
        }
    }
 }

Also see this question: CSV parser in C++

edited May 23 '17 at 12:10

Community

1
1

answered Jan 06 '09 at 05:18

Martin York

257,169
86
333
562

1

Yeah, but what's the fun in that? :p – Jan 06 '09 at 05:21
3

it gets more complicated if you allow commas in cells, perhaps by quoting cells, escaping commas or both. – wilhelmtell Jan 06 '09 at 05:28
Thanks a lot man. If I may though, how would I get data from a csv hosted online? Would I just do data("http://csvhost.com/plop.csv") or is there something else? – zkwentz Jan 06 '09 at 05:37
libCURL is an easy-to-use C library that can fetch a remote file over HTTP(S). Other frameworks exist, such as POCO (or probably something in Boost or ACE). The C++ Standard I/O streams don't address protocol-aware remote file download. – Tom Jan 06 '09 at 05:44
nice STL-ized version! I get wary of cluttering the source with boost craziness for the stupid file IO, when it's what I'm going to do with it that needs debugging. – peter karasev Apr 06 '11 at 05:47

score 21 · Answer 2 · answered Jan 06 '09 at 10:49

21

You can try the Boost Tokenizer library, in particular the Escaped List Separator

answered Jan 06 '09 at 10:49

Alessandro Jacopson

18,047
15
98
153

This is the best way. escape_list_separator<> correctly handles edge cases such as quoted strings with commas inside them. – Ferruccio Jan 06 '09 at 12:49
3

quoted strings are not edge cases (unless you have tunnel vision) – Tom Jan 07 '09 at 05:23

score 9 · Accepted Answer · answered Jan 06 '09 at 05:41

If what you're really doing is manipulating a CSV file itself, Nelson's answer makes sense. However, my suspicion is that the CSV is simply an artifact of the problem you're solving. In C++, that probably means you have something like this as your data model:

struct Customer {
    int id;
    std::string first_name;
    std::string last_name;
    struct {
        std::string street;
        std::string unit;
    } address;
    char state[2];
    int zip;
};

Thus, when you're working with a collection of data, it makes sense to have std::vector<Customer> or std::set<Customer>.

With that in mind, think of your CSV handling as two operations:

// if you wanted to go nuts, you could use a forward iterator concept for both of these
class CSVReader {
public:
    CSVReader(const std::string &inputFile);
    bool hasNextLine();
    void readNextLine(std::vector<std::string> &fields);
private:
    /* secrets */
};
class CSVWriter {
public:
    CSVWriter(const std::string &outputFile);
    void writeNextLine(const std::vector<std::string> &fields);
private:
    /* more secrets */
};
void readCustomers(CSVReader &reader, std::vector<Customer> &customers);
void writeCustomers(CSVWriter &writer, const std::vector<Customer> &customers);

Read and write a single row at a time, rather than keeping a complete in-memory representation of the file itself. There are a few obvious benefits:

Your data is represented in a form that makes sense for your problem (customers), rather than the current solution (CSV files).
You can trivially add adapters for other data formats, such as bulk SQL import/export, Excel/OO spreadsheet files, or even an HTML <table> rendering.
Your memory footprint is likely to be smaller (depends on relative sizeof(Customer) vs. the number of bytes in a single row).
CSVReader and CSVWriter can be reused as the basis for an in-memory model (such as Nelson's) without loss of performance or functionality. The converse is not true.

Beware of quotes. There are multiple kinds of escaping for CSV. If your string can contain comma, it's quoted. If it is quoted and contains double quote, you're in trouble. I believe Excel escapes quotes by doubling them, but I am not sure. — , Jan 06 '09 at 16:36

score 8 · Answer 4 · answered Jan 06 '09 at 14:43

I've worked with a lot of CSV files in my time. I'd like to add the advice:

1 - Depending on the source (Excel, etc), commas or tabs may be embedded in a field. Usually, the rule is that they will be 'protected' because the field will be double-quote delimited, as in "Boston, MA 02346".

2 - Some sources will not double-quote delimit all text fields. Other sources will. Others will delimit all fields, even numerics.

3 - Fields containing double-quotes usually get the embedded double quotes doubled up (and the field itself delimited with double quotes, as in "George ""Babe"" Ruth".

4 - Some sources will embed CR/LFs (Excel is one of these!). Sometimes it'll be just a CR. The field will usually be double-quote delimited, but this situation is very difficult to handle.

You should be all right if you follow this - http://tools.ietf.org/html/rfc4180 — Tom, Jan 07 '09 at 05:24
In other words, there is no such thing as "CSV format", but rather a family of similar formats. — Raedwald, Nov 14 '12 at 17:47

score 7 · Answer 5 · 2009-01-06T05:25:10.300

This is a good exercise for yourself to work on :)

You should break your library into three parts

Loading the CSV file
Representing the file in memory so that you can modify it and read it
Saving the CSV file back to disk

So you are looking at writing a CSVDocument class that contains:

Load(const char* file);
Save(const char* file);
GetBody

So that you may use your library like this:

CSVDocument doc;
doc.Load("file.csv");
CSVDocumentBody* body = doc.GetBody();

CSVDocumentRow* header = body->GetRow(0);
for (int i = 0; i < header->GetFieldCount(); i++)
{
    CSVDocumentField* col = header->GetField(i);
    cout << col->GetText() << "\t";
}

for (int i = 1; i < body->GetRowCount(); i++) // i = 1 so we skip the header
{
    CSVDocumentRow* row = body->GetRow(i);
    for (int p = 0; p < row->GetFieldCount(); p++)
    {
        cout << row->GetField(p)->GetText() << "\t";
    }
    cout << "\n";
}

body->GetRecord(10)->SetText("hello world");

CSVDocumentRow* lastRow = body->AddRow();
lastRow->AddField()->SetText("Hey there");
lastRow->AddField()->SetText("Hey there column 2");

doc->Save("file.csv");

Which gives us the following interfaces:

class CSVDocument
{
public:
    void Load(const char* file);
    void Save(const char* file);

    CSVDocumentBody* GetBody();
};

class CSVDocumentBody
{
public:
    int GetRowCount();
    CSVDocumentRow* GetRow(int index);
    CSVDocumentRow* AddRow();
};

class CSVDocumentRow
{
public:
    int GetFieldCount();
    CSVDocumentField* GetField(int index);
    CSVDocumentField* AddField(int index);
};

class CSVDocumentField
{
public:
    const char* GetText();
    void GetText(const char* text);
};

Now you just have to fill in the blanks from here :)

Believe me when I say this - investing your time into learning how to make libraries, especially those dealing with the loading, manipulation and saving of data, will not only remove your dependence on the existence of such libraries but will also make you an all-around better programmer.

:)

EDIT

I don't know how much you already know about string manipulation and parsing; so if you get stuck I would be happy to help.

Dude, over the top. Thanks a megaton. – zkwentz Jan 06 '09 at 05:22 — zkwentz, Jan 06 '09 at 05:22
No, don't do it yourself. Use a well-tested library. – Brian Feb 25 '10 at 20:58 — Brian, Feb 25 '10 at 20:58

score 6 · Answer 6 · answered Feb 17 '09 at 13:21

Here is some code you can use. The data from the csv is stored inside an array of rows. Each row is an array of strings. Hope this helps.

#include <iostream>
#include <string>
#include <fstream>
#include <sstream>
#include <vector>
typedef std::string String;
typedef std::vector<String> CSVRow;
typedef CSVRow::const_iterator CSVRowCI;
typedef std::vector<CSVRow> CSVDatabase;
typedef CSVDatabase::const_iterator CSVDatabaseCI;
void readCSV(std::istream &input, CSVDatabase &db);
void display(const CSVRow&);
void display(const CSVDatabase&);
int main(){
  std::fstream file("file.csv", std::ios::in);
  if(!file.is_open()){
    std::cout << "File not found!\n";
    return 1;
  }
  CSVDatabase db;
  readCSV(file, db);
  display(db);
}
void readCSV(std::istream &input, CSVDatabase &db){
  String csvLine;
  // read every line from the stream
  while( std::getline(input, csvLine) ){
    std::istringstream csvStream(csvLine);
    CSVRow csvRow;
    String csvCol;
    // read every element from the line that is seperated by commas
    // and put it into the vector or strings
    while( std::getline(csvStream, csvCol, ',') )
      csvRow.push_back(csvCol);
    db.push_back(csvRow);
  }
}
void display(const CSVRow& row){
  if(!row.size())
    return;
  CSVRowCI i=row.begin();
  std::cout<<*(i++);
  for(;i != row.end();++i)
    std::cout<<','<<*i;
}
void display(const CSVDatabase& db){
  if(!db.size())
    return;
  CSVDatabaseCI i=db.begin();
  for(; i != db.end(); ++i){
    display(*i);
    std::cout<<std::endl;
  }
}

score 2 · Answer 7 · answered Feb 25 '10 at 20:52

Using boost tokenizer to parse records, see here for more details.

ifstream in(data.c_str());
if (!in.is_open()) return 1;

typedef tokenizer< escaped_list_separator<char> > Tokenizer;

vector< string > vec;
string line;

while (getline(in,line))
{
    Tokenizer tok(line);
    vec.assign(tok.begin(),tok.end());

    /// do something with the record
    if (vec.size() < 3) continue;

    copy(vec.begin(), vec.end(),
         ostream_iterator<string>(cout, "|"));

    cout << "\n----------------------" << endl;
}

Btw, the escaped_list_separator parses a superset of the csv format so you should be good with all the 'corner' cases, http://www.boost.org/doc/libs/1_42_0/libs/tokenizer/escaped_list_separator.htm — stefanB, Feb 25 '10 at 22:25

Jonathan Leffler · Answer 8 · 2015-08-10T18:29:20.520

2

Look at 'The Practice of Programming' (TPOP) by Kernighan & Pike. It includes an example of parsing CSV files in both C and C++. But it would be worth reading the book even if you don't use the code.

(Previous URL: http://cm.bell-labs.com/cm/cs/tpop/)

edited Aug 10 '15 at 18:29

answered Jan 06 '09 at 05:49

Jonathan Leffler

730,956
141
904
1,278

I don't have access to the book at the moment but I don't believe the solution presented there was complete or robust. – Robert Gamble Jan 06 '09 at 14:25
It may depend on your definition of complete and robust. It was read-only, not write too. But the C++ looked OK to me - simple, robust across Windows, old MacOS and Unix for line endings (CRLF, CR, or LF). It didn't have every bell and whistle; it did handle nested quotes, etc. Code's online at URL. – Jonathan Leffler Jan 06 '09 at 18:48

score 0 · Answer 9 · answered Apr 06 '09 at 04:28

0

I found this interesting approach:

CSV to C structure utility

Quote: CSVtoC is a program that takes a CSV or comma-separated values file as input and dumps it as a C structure.

Naturally, you can't make changes to the CSV file, but if you just need in-memory read-only access to the data, it could work.

answered Apr 06 '09 at 04:28

Kevin P.

1,414
1
18
24

1

Link no longer functions. – boatcoder Feb 10 '16 at 16:38

How can I read and manipulate CSV file data in C++?

9 Answers9

Linked

Related