Parser/split const char* in c++

Question

I tried to find a solution but I did not find anything that solved my question.

I have a C++ program that receive a const char* variable (filedata) and the size (filesize). The contents of this variable is a csv format. Each field is separated by ';'. The content is also dynamic, and may have more or less content, since this variable represents a set of logs. There is also a delimiter \n to represent the line break.

Example 1 of filedata:

const char* filedata =
    "1496843100;2017-06-07 13:45:00;000002D8;2800;0x23000CCD.VARIABLE67\n"
    "1496843100;2017-06-07 13:45:00;000002D9;2800;0x23000CCD.VARIABLE68";

Example 2 of fildedata:

const char* filedata =
    "1496843100;2017-06-07 13:45:00;000002D8;2800;0x23000CCD.VARIABLE67\n"
    "1496843100;2017-06-07 13:45:00;000002D9;2800;0x23000CCD.VARIABLE68\n"
    "1496843100;2017-06-07 13:45:00;000002DA;2800;0x23000CCD.VARIABLE69";

If you see the example 1 only have 2 lines, and the example 2 have 3 lines. I never know how many lines I have. I can have 2, 3, 200, 1000, etc. lines and the filedata variable save all content.

So my objective is to receive this filedata variable (I also have access to filesize) and for each line I need to parse the field 1 and 2 (timestamp and the data in normal format).

Expected output (for the example 2):

1496843100 2017-06-07 13:45:00
1496843100 2017-06-07 13:45:00
1496843100 2017-06-07 13:45:00

In example 2 I have 3 lines, so I need to iterate all lines and for each line parse the specific fields, very similar to the output. After this i pick each parser fields and save to object list (This part is already implemented. I'm just having trouble parsing filedata.

Possible duplicate of [Split a string in C++?](https://stackoverflow.com/questions/236129/split-a-string-in-c) — Andre Kampling, Jun 08 '17 at 09:25
Thanks for the reply. Yes it's helps me on part to split by delimiter ';'. But i cannot edit the code only to cut the "columns" i want for each line. In this example i split all columns by ";", but i dont want all columns. — rrpik93, Jun 08 '17 at 09:35
But why? I'm new in c++, and look for the example given my @AndreKampling i cannot understand how to ignore the columns that i don't want — rrpik93, Jun 08 '17 at 09:41
In the given example you get a `std::vector strVec`. So you can access the elements by: `strVec[idx]` or `strVec.at(idx)` (if you want to use excepions). If you know how much columns you have (e.g. `COLCOUNT`) you can go into the next line/row by: `strVec[COLCOUNT * rowIdx + colIdx]`. — Andre Kampling, Jun 08 '17 at 09:45
Thanks. Now i think i understand how to do this. Thanks @AndreKampling — rrpik93, Jun 08 '17 at 09:59
If speed is not a huge concern I would consider using `std::regex`, otherwise looping using `std::find(..., '\n')` will likely be the fastest. — Galik, Jun 08 '17 at 10:02
Yes, the speed is important because i'm building a real time API. @AndreKampling i'm having some difficulties to iterate the vector. I need to iterate the vector and for each iteration (that is for each line of filedata) i need to get the vector.at(0) and vector.at(2). What is the best way to do this? — rrpik93, Jun 08 '17 at 10:22

score 0 · Answer 1 · answered Jun 08 '17 at 10:58

You can use this regex

const char *regex_str  = "\\d{10};[\\d,-]{10} [\\d,:]{8}"; //verified in http://regexr.com/

And then find all regex from your input const char * - get help from finding all regex - for windows.

In mac os std::regex may not work directly. need to add -stdlib=libc++ in the command line

Andre Kampling · Answer 2 · 2017-06-15T12:45:00.687

Here is working code with the output you want. I used this SO answer to the SO question I reference to in my duplicate flag. I modified it, so that the new line character \n act also as delimiter. Therefore in the code are two while loops.

You have to pass the number of columns you want to have (cols) to the split() function. Also you can (optional) pass the columns that should be excluded (filtCol). The example under the code uses: cols = 5 and filtCols = (1 << 1) | (1 << 3), that means all five columns should be parsed except column 2 and 4. Thererfor just column 1, 3 and 5 are in the resulting vector. I used a bit pattern for it because it will evaluate faster than a list/array of numbers.

#include <string>
#include <sstream>
#include <vector>
#include <iterator>
#include <iostream>

template<typename Out>
void split(const std::string& s, char delim, size_t cols, size_t filtCol, Out result)
{
   std::stringstream ss;
   ss.str(s);
   std::string item;

   /* Two while loops two separate on new line first */
   while (std::getline(ss, item))
   {
      std::stringstream ssLine;
      ssLine.str(item);
      std::string itemLine;

      /* Parse line and separate */
      size_t curCol = 0;
      while (std::getline(ssLine, itemLine, delim))
      {
         /* Just add column is in range and is not excluded by */
         /* bit pattern!                                       */
         if (curCol < cols && (~filtCol & (1 << curCol)))
         {
            *(result++) = itemLine;
         }

         ++curCol;
      }
   }
}

std::vector<std::string> split(const std::string& s, char delim, size_t cols, size_t filtCol = 0)
{
   std::vector<std::string> elems;
   split(s, delim, cols, filtCol, std::back_inserter(elems));
   return elems;
}

/* Example usage */
int main()
{
   const char* filedataI =
       "1496843100;2017-06-07 13:45:00;000002D8;2800;0x23000CCD.VARIABLE67\n"
       "1496843100;2017-06-07 13:45:00;000002D9;2800;0x23000CCD.VARIABLE68\n"
       "1496843100;2017-06-07 13:45:00;000002DA;2800;0x23000CCD.VARIABLE69";

   size_t colsRange = 5; /* Parse from col 1 to 5 (all five) */
   size_t colsFiltered = (1 << 1) | (1 << 3); /* Exclude col 2 and 4 */
   size_t colsPerLine = 3; /* 5 - 2 = 3 */

   std::vector<std::string> strVecI = split(filedataI, ';', colsRange, colsFiltered);
   for (size_t idx = 0; idx < strVecI.size(); ++idx)
   {
      if (idx > 0 && 0 == idx % colsPerLine)
      {
         std::cout << std::endl;
      }
      std::cout << "\"" << strVecI[idx] << "\" " << " ";
   }
}

Output with 3 columns wanted (5 with 2 excluded: cols = 5 and filtCols = (1 << 1) | (1 << 3)), I also printed additionally the " and three spaces in between:

"1496843100"  "000002D8"  "0x23000CCD.VARIABLE67"  
"1496843100"  "000002D9"  "0x23000CCD.VARIABLE68"  
"1496843100"  "000002DA"  "0x23000CCD.VARIABLE69"

Thanks for the reply and the example code. It's working. But in my case its not the exactly thing that i want. In your code, if i put 4 in the 'colsWanted' we split the first 4 columns, if i put 2 we split the first 2 columnst, etc. But if i want only the 1 column and the 3 column what i need to do? because if i put 3 in 'colsWanted' we split the 1,2 and 3 column. Thanks again — rrpik93, Jun 08 '17 at 12:34
Then just don't access the not wanted column. Or change the program that it can skip columns. — Andre Kampling, Jun 08 '17 at 12:36
In my question i show the example getting the first and second column and for this your code works good. But if i want the first, second and fourth column its not workin — rrpik93, Jun 08 '17 at 12:36
Good idea, i cant try to change the program to columns not wanted, and we return back the columns that i want — rrpik93, Jun 08 '17 at 12:37
@rrpik93: I changed my answer/code so that you can exclude columns as you like, they will not be saved and be overread. Now you're fully felxible. Also you don't need the columns count anymore because the new line character at the end of a line. — Andre Kampling, Jun 08 '17 at 19:52
Hi @rrpik93 if this or any answer has solved your question please consider [accepting it](https://meta.stackexchange.com/q/5234/179419) by clicking the check-mark. This indicates to the wider community that you've found a solution and gives some reputation to both the answerer and yourself. There is no obligation to do this. — Andre Kampling, Jul 09 '17 at 09:47

score 0 · Answer 3 · answered Jun 08 '17 at 11:29

Use <regex> library
and regex_token_iterator as splitter

First of all split with \n and with ;

the code:

const char* filedata =
    "1496843100;2017-06-07 13:45:00;000002D8;2800;0x23000CCD.VARIABLE67\n"
    "1496843100;2017-06-07 13:45:00;000002D9;2800;0x23000CCD.VARIABLE68\n"
    "1496843100;2017-06-07 13:45:00;000002DA;2800;0x23000CCD.VARIABLE69";

const char* begin_f = filedata;
const char* end___f = filedata + std::string( filedata ).size();

/* first of all split by newline */

std::vector< std::string > vec_str;
std::regex regex1( "\n" );
std::regex regex2( ";" );

std::regex_token_iterator< const char* > first( begin_f, end___f, regex1, -1 ), last;
vec_str.assign( first, last );

for( std::string str1 : vec_str ){

    /* then split by semicolon ; */
    std::regex_token_iterator< std::string::const_iterator > first( str1.begin(),str1.end(), regex2, -1 ), last;
    int counter = 2;
    while( first != last && counter-- ){
        std::cout << *first++ << " ";
    }
    std::cout << '\n';

}

the output:

1496843100 2017-06-07 13:45:00 
1496843100 2017-06-07 13:45:00 
1496843100 2017-06-07 13:45:00

score 0 · Answer 4 · answered Jun 08 '17 at 21:56

Here is a solution using std::find() that should be pretty fast and efficient. The idea is you have an outer loop that finds each successive line ending '\n' and an inner loop that finds (within that range) each successive field ending ';'

In the heart of the two loops you get the opportunity to do whatever you like with the columns:

char const* filedata =
    "1496843100;2017-06-07 13:45:00;000002D8;2800;0x23000CCD.VARIABLE67\n"
    "1496843100;2017-06-07 13:45:00;000002D9;2800;0x23000CCD.VARIABLE68\n"
    "1496843100;2017-06-07 13:45:00;000002DA;2800;0x23000CCD.VARIABLE69";

auto filesize = std::strlen(filedata);

auto line_beg = filedata;
auto line_end = filedata + filesize;

for(; auto line_pos = std::find(line_beg, line_end, '\n'); line_beg = line_pos + 1)
{
    auto field_beg = line_beg;
    auto field_end = line_pos;

    auto field_number = 0U;
    for(; auto field_pos = std::find(field_beg, field_end, ';'); field_beg = field_pos + 1)
    {
        ++field_number;

        // select the field number you want here
        if(field_number == 1 || field_number == 2)
        {
            // do something with the field that starts at field_beg
            // and ends at field_pos 
            std::cout << ' ' << std::string(field_beg, field_pos);
        }

        if(field_pos == field_end)
            break;
    }

    std::cout << '\n';

    if(line_pos == line_end)
        break;
}

Output:

 1496843100 2017-06-07 13:45:00
 1496843100 2017-06-07 13:45:00
 1496843100 2017-06-07 13:45:00

s.paszko · Answer 5 · 2017-06-08T11:34:57.747

-1

Fast solution : You can use similar method to explode() function from PHP. Here is answer how to create explode function in C++ enter link description here. Probably you will have to modify answered code to take standard C string as input.

Then if you will have own explode() function version, you can do something like std::vector<std::string> lines = explode(filedata,'\n').

Next step will be for each lines element do std::vector<std::string> line_elements = explode(lines[i], ';'). Then you will have each separate field and you can print/parse what do you want.

edited Jun 08 '17 at 11:34

answered Jun 08 '17 at 10:08

s.paszko

633
1
7
21

Please reread your question and fix it. And please write in English, _sth_ is not an english word. And remove the longer solution, using C string functions in a C++ program is a bad idea. – Jabberwocky Jun 08 '17 at 10:10

Parser/split const char* in c++

5 Answers5