2

There are many options of csv to vector, including read a csv file and and add its all data into vector in c++ however I want to something a bit above or below csv -> vector. Instead, I have a CURL function that loads csv data into a std::string in the format of

col1,col2,col3
abc,2,ghi
jkl,2,pqr

which, each row is separated by \n. How can I parse data in this given structure into a std::vector<data>

Where data would look something like

struct data
{
  std::string col1, col3;
  int col2;
};
  • 1
    Read from `std::stringstream` instead of `fstream`. – MikeCAT Jul 08 '21 at 13:37
  • @MikeCAT How would I do that? The format is the whole `std::string` being separated by commas and `\n`, which I would want to be a new entry in a `std::vector` – asjhdbashjdbasjhdbhjb Jul 08 '21 at 13:42
  • `std::istringstream stream(myString);` then the code is the same as it was to process from a line of a fstream. – drescherjm Jul 08 '21 at 13:44
  • @drescherjm But then I have to separate it on `\n` – asjhdbashjdbasjhdbhjb Jul 08 '21 at 13:46
  • @MikeCAT Also, suppose it was an enormous `std::string`, wouldn't it be inefficient to just construct an entirely new `stringstream`, and instead somehow load the data into the `stringstream` initially instead? – asjhdbashjdbasjhdbhjb Jul 08 '21 at 13:52
  • @asjhdbashjdbasjhdbhjb *which, each row is separated by \n* -- Why is that a concern if the file is a text file and you open the file in text mode? Each `getline` will read a line. – PaulMcKenzie Jul 08 '21 at 13:56
  • @PaulMcKenzie I'm not opening a file, I am just running a `CURL` function that loads the data from a `csv` on a website into a `std::string` – asjhdbashjdbasjhdbhjb Jul 08 '21 at 13:57
  • Then I don't see the issue with using `std::istringstream`. Have you actually tried using it? – PaulMcKenzie Jul 08 '21 at 14:04
  • @PaulMcKenzie 1. Is there a better solution that I don't have to copy each row of `std::string` and into `std::istringstream` and then into `std::vector`, rather, go from `std::string` to `std::vector` and 2. I don't know how to delimit the `istringstream` by `\n` and load that into a vector. – asjhdbashjdbasjhdbhjb Jul 08 '21 at 14:07
  • @asjhdbashjdbasjhdbhjb -- *I don't know how to delimit the istringstream by \n* -- The stream already comes with `\n`, according to your description. There is nothing for you to delimit, since it's there already. – PaulMcKenzie Jul 08 '21 at 14:12
  • @PaulMcKenzie Oh okay. From there then, how can I load that into a vector? – asjhdbashjdbasjhdbhjb Jul 08 '21 at 14:13
  • Oh, I should have been much more clear in my post sorry, I just realised that it doesn't neccessarily have to be a `std::string`, it could be a float or int etc, so I would like to parse it into a `struct` of the corresponding column. – asjhdbashjdbasjhdbhjb Jul 08 '21 at 14:44

1 Answers1

4

If it is only parser you need to crate in your application, you can build some simple streaming recursive parser like this:

#include <cctype>
#include <cstring>
#include <vector>
#include <string>
#include <iostream>

struct data
{
  std::string col1;
  int col2;
  std::string col3;
};

std::ostream& operator<<(std::ostream& to,const data& d)
{
    to << d.col1 << ',';
    to << d.col2 << ',';
    to << d.col3;
}

static char* skip_spaces(const char* csv)
{
  constexpr const char* WHITESPACE = "\t\n\v\f\r ";
  return const_cast<char*>( csv + std::strspn(csv,WHITESPACE) );
}


static const char* parse_csv_line(const char* csv, data& to)
{
  char* b = skip_spaces(csv);
  char* e = std::strchr(b,',');
  to.col1 = std::string(b,e);
  b = skip_spaces(e+1);
  e = std::strchr(b,',');
  to.col2 = std::strtol(b,&e,10);
  b = skip_spaces(e+1);
  e = std::strchr(b,'\n');
  if(nullptr == e) {
    e = b + std::strlen(b);
  }
  to.col3 = std::string(b,e);
  return ('\0' == *e) ? nullptr : e + 1;
}

std::vector<data> parse_csv(const char* csv)
{
  std::vector<data> ret;
  // skip header
  csv = std::strchr(csv,'\n');
  while(nullptr !=  csv) {
    data next;
    csv = parse_csv_line(csv, next);
    ret.push_back( next );
  }
  return ret;
}


int main(int argc, const char** argv)
{
  const char* CSV = "col1,col2,col3,\r\nabc,2,ghi\r\njkl,2,pqr";
  std::vector<data> parsed = parse_csv(CSV);
  for(auto d: parsed) {
    std::cout << d << std::endl;
  }
  return 0;
}

If you need something much more complex, i.e. handling errors etc use some CSV parsing library

Victor Gubin
  • 2,782
  • 10
  • 24
  • Hi, I am sorry, I didn't clarify my question correctly, please see my edit. – asjhdbashjdbasjhdbhjb Jul 08 '21 at 14:45
  • @asjhdbashjdbasjhdbhjb no problem at all, see my update – Victor Gubin Jul 08 '21 at 18:03
  • Is there anywhare that I need to `free()` since it is returning `char*`'s? – asjhdbashjdbasjhdbhjb Jul 08 '21 at 23:14
  • That works really well, it takes about 8 milliseconds to load a `369 KB` csv file into a vector! – asjhdbashjdbasjhdbhjb Jul 09 '21 at 00:08
  • `Is there anywhare that I need to free() since it is returning char*'s?` it is returning vector, you don't need to free anything. Use string.data() or string.c_str() to take raw C 0-ro terminated pointer from std::string. Parser is actually written rather on C, then C++, string_view, from_chars etc will be more C++ way to do the same thing. – Victor Gubin Jul 09 '21 at 08:00
  • I'm referring to the `skip_spaces`, `parse_csv_line`, `std::strchr` They all return `char*` so i dont know how they return that if it is stack-allocated. – asjhdbashjdbasjhdbhjb Jul 09 '21 at 08:23
  • 1
    They are manipulating raw pointers, i.e. memory addresses - just like assembly. It can be stack or heap - doesn't mater. You can be sure that memory not going to be changes since C standard library have no bugs. – Victor Gubin Jul 09 '21 at 09:02