1

I have an issue with a file I am trying to read in and I don't know how to do solve it.

The file is a CSV, but there are also commas in the text of the file, so there are quotes around the commas indicating new values.

For instance:

"1","hello, ""world""","and then this"  // In text " is written as ""

I would like to know how to deal quotes using a QFileStream (though I haven't seen a base solution either).

Furthermore, another problem is that I also can't read line by line as within these quotes there might be newlines.

In R, there is an option of quotes="" which solves these problems.

There must be something in C++. What is it?

PascalVKooten
  • 20,643
  • 17
  • 103
  • 160
  • 3
    what have you tried? http://stackoverflow.com/questions/1120140/csv-parser-in-c and http://stackoverflow.com/questions/7827274/whats-the-preferred-library-for-csv-parsing-writing-in-c certainly seem like duplicates of your question from my POV – codeling Sep 11 '13 at 06:50
  • @nyarlathotep A standard CSV document (as much as I've seen) have `seperator = ,` and `end of line = \n`. In this case, there are quotes involved to allow these characters to occur. In that regard it is different: I cannot simply read in the file using these methods. – PascalVKooten Sep 11 '13 at 07:00
  • @nyarlathotep Also, in the first link provided they explicitly mention not being interested in this situation. – PascalVKooten Sep 11 '13 at 07:03
  • the quotes are in the standard too. you should check the linked questions more thoroughly; the libraries linked in the (second) question can handle this (http://code.google.com/p/csv-parser-cplusplus/, https://code.google.com/p/csvpp/) – codeling Sep 11 '13 at 07:04
  • And what I've tried is reading it in normally, splitting it by comma and all that. This does not work, I need to be able to indicate some quoting setting but I can't find it. – PascalVKooten Sep 11 '13 at 07:04
  • Thanks for the google links, they look helpful. – PascalVKooten Sep 11 '13 at 07:12
  • they would have been in the linked questions already as well... – codeling Sep 11 '13 at 07:42

2 Answers2

2

You can split by quote (not just quote, but any symbol, like '\' for example) symbol in qt, just put \ before it, Example : string.split("\""); will split string by '"' symbol.

Here is a simple console app to split your file (the easiest solution is to split by "," symbols seems so far):

// opening file split.csv, in this case in the project folder
QFile file("split.csv");
file.open(QIODevice::ReadOnly);
// flushing out all of it's contents to stdout, just for testing
std::cout<<QString(file.readAll()).toStdString()<<std::endl;
// reseting file to read again
file.reset();
// reading all file to QByteArray, passing it to QString consructor, 
// splitting that string by "," string and putting it to QStringList list
// where every element of a list is value from cell in csv file
QStringList list=QString(file.readAll()).split("\",\"",QString::SkipEmptyParts);

// adding back quotes, that was taken away by split
for (int i=0; i<list.size();i++){
    if (i!=0) list[i].prepend("\"");
    if (i!=(list.size()-1)) list[i].append("\"");
}//*/
// flushing results to stdout
foreach (QString i,list)    std::cout<<i.toStdString()<<std::endl; // not using QDebug, becouse it will add more quotes to output, which is already confusing enough

where split.csv contains "1","hello, ""world""","and then this" and the output is:

"1"
"hello, ""world"""
"and then this"
Shf
  • 3,463
  • 2
  • 26
  • 42
  • this is cvs, so your code will fail on this line: `no quote,"""qute with"",""quote"""` (there are two problems: no quote and it contains `","` inside a value) – Marek R Sep 11 '13 at 11:55
  • @MarekR (not cvs, csv as for comma separated values) - indeed it will fail in these cases, that is obvious, i just assumed, judging by example that was given, that all values are put in quotes. In this case, if `","` inside a value will not be possible to filter anyway ( how would you know, what comma is inside a value and what comma separates two values in quotes). Anyway, the main point was to show, how it can be split by `'"'` symbols,program was just to show, how it can be used. If text format will be specified i'll edit post accordingly, for now it seems like this does what author wants – Shf Sep 11 '13 at 12:19
  • I don't understand the output; I am looking for a solution that counts this as simply 1 line. – PascalVKooten Sep 11 '13 at 15:41
  • @Dualinity read again line `QString(file.readAll()).split("\",\"",QString::SkipEmptyParts);` it is splitting by `","` string, not just by comma. Output is simple. Every value, that was read is printed on the new line. – Shf Sep 11 '13 at 16:50
  • @Dualinity so, you want easy solution to read this csv file, and instead of putting this to function, you decide to download, compile, add to project huge lirary just to read file? Well, it's up to you to decide – Shf Sep 11 '13 at 16:58
1

After googling I've found some ready solution. See this article about qxt.

Marek R
  • 32,568
  • 6
  • 55
  • 140
  • qxt is huge, and to include and build that library just to parse simple csv seems like a waste for me. Though i am wrong if files format differs a lot, then indeed some csv library would be handy. – Shf Sep 11 '13 at 12:36
  • @Shf I state that I would prefer a solution in Qt. – PascalVKooten Sep 11 '13 at 16:03
  • @Dualinity libqxt is NOT qt. Is it another separate library. And by adding it to project would definitely not be a solution in Qt. Just probably unneeded dependency – Shf Sep 11 '13 at 16:52
  • That implementation seems to load all the lines at once with `file.readAll()`, not quite recommendable for large files. – Daniel Vérité Sep 13 '13 at 16:15
  • Yea, I am sad about this. I am now facing 2 million lines, that is 3gb of data. – PascalVKooten Sep 16 '13 at 19:29