4

I'm doing the project that convert the python code to C++, for better performance. That python project name is Adcvanced EAST, for now, I got the input data for nms function, in .csv file like this:

"[ 5.9358170e-04  5.2773970e-01  5.0061589e-01 -1.3098677e+00
 -2.7747922e+00  1.5079222e+00 -3.4586751e+00]","[ 3.8175487e-05  6.3440394e-01  7.0218205e-01 -1.5393494e+00
 -5.1545496e+00  4.2795391e+00 -3.4941311e+00]","[ 4.6003381e-05  5.9677261e-01  6.6983813e-01 -1.6515008e+00
 -5.1606908e+00  5.2009044e+00 -3.0518508e+00]","[ 5.5172237e-05  5.8421570e-01  5.9929764e-01 -1.8425952e+00
 -5.2444854e+00  4.5013981e+00 -2.7876694e+00]","[ 5.2929961e-05  5.4777789e-01  6.4851379e-01 -1.3151239e+00
 -5.1559062e+00  5.2229333e+00 -2.4008298e+00]","[ 8.0250458e-05  6.1284608e-01  6.1014801e-01 -1.8556541e+00
 -5.0002270e+00  5.2796564e+00 -2.2154367e+00]","[ 8.1256607e-05  6.1321974e-01  5.9887391e-01 -2.2241254e+00
 -4.7920742e+00  5.4237065e+00 -2.2534993e+00]

one unit is 7 numbers, but a '\n' after first four numbers, I wanna read this csv file into my C++ project, so that I can do the math work in C++, make it more fast.

using namespace std;

void read_csv(const string &filename)
{
//File pointer
fstream fin;
//open an existing file
fin.open(filename, ios::in);

vector<vector<vector<double>>> predict;

string line;
while (getline(fin, line))
{
    std::istringstream sin(line);
    vector<double> preds;
    double pred;
    while (getline(sin, pred, ']'))
    {
        preds.push_back(preds);
    }

}

}

For now...my code emmmmmm not working ofc, I'm totally have no idea with this... please help me with read the csv data into my code. thanks

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
BAO TONG
  • 51
  • 1
  • 1
  • 2
  • 1
    Search for a library that does it for you. CSV is a *deceptively* simple format, but with so many corner and special cases that it's non-trivial to parse. Especially true for your input, as it's formatted in a non-conventional way (seems to be a direct dump on console of some internal data of a Python script). – Some programmer dude Jul 11 '19 at 11:06
  • Feed the output of `getline(fin, string, ']')` to a `stringstream`. Discard the first word and read `double`s until end of stream. That is one record. Repeat for the rest. – Botje Jul 11 '19 at 11:12

2 Answers2

9

Unfortunately parsing strings (and consequently files) is very tedious in C++.

I highly recommend using a library, ideally a header-only one, like this one.

If you insist on writing it yourself, maybe you can draw some inspiration from this StackOverflow question on how to parse general CSV files in C++.

Benno Straub
  • 2,268
  • 3
  • 19
  • 21
0

You could look at getdelim(',', fin, line),

But the other issue will be those quotes, unless you /know/ the file is always formatted exactly this way, it becomes difficult.

One hack I have used in the past that is NOT PERFECT, if the first character is a quote, then the last character before the comma must also be a matching quote, and not escaped.

If it is not a quote then getdelim() some more, but the auto-alloc feature of getdelim means you must use another buffer. In C++ I end up with a vector of all the pieces of getdelim results that then need to be concatenated to make the final string:

std::vector<char*> gotLine;
gotLine.push_back(malloc(2));
*gotLine.back() = fgetch();
gotLine.back()[1] = 0;
bool gotquote = *gotLine.back() == '"'; // perhaps different classes of quote
if (*gotLine.back() != ',')
 for(;;)
 {
  char* gotSub= nullptr;
  gotSub=getdelim(',');
  gotLine.push_back(gotSub);
  if (!gotquote) break;
  auto subLen = strlen(gotSub);
  if (subLen>1 && *(gotSub-1)=='"') // again different classes of quote
    if (sublen==2 || *(gotSub-2)!='\\') // needs to be a while loop
       break;
 }

Then just concatenate all these string segments back together.

Note that getdelim supports null bytes. If you expect null bytes in the content, and not represented by the character sequences \000 or \@ you need to store the actual length returned by getdelim, and use memcpy to concatenate them.

Oh, and if you allow utf-8 extended quotes it gets very messy!

The case this doesn't cover is a string that ends \\" or \\\\". Ideally you need to while count the number of leading backslashes, and accept the quote if the count is even.

Note that this leave the issue of unescaping the quoted content, i.e. converting any \" into ", and \\ into \, etc. Also discarding the enclosing quotes.

In the end a library may be easier if you need to deal with completely arbitrary content. But if the content is "known" you can live without.

Gem Taylor
  • 5,381
  • 1
  • 9
  • 27