3

I am aware of several related questions, such as Parsing a comma-delimited std::string one. However, I have created a code that fits my specific need - to split the string (read from a file) at comma stripping any whitespaces. Later I want to convert these substrings to double and store in std::vector. Not all operations are shown. Here is the code I am giving.

include "stdafx.h"
#include<iostream>
#include<string>
#include<vector>
#include<algorithm>

int main()
{
    std::string str1 = "  0.2345,  7.9  \n", str2;
    str1.erase(remove_if(str1.begin(), str1.end(), isspace), str1.end()); //remove whitespaces
    std::string::size_type pos_begin = { 0 }, pos_end = { 0 };


    while (str1.find_first_of(",", pos_end) != std::string::npos)
    {
        pos_end = str1.find_first_of(",", pos_begin);
        str2 = str1.substr(pos_begin, pos_end- pos_begin);
        std::cout << str2 << std::endl;
        pos_begin = pos_end+1;
    }

}

Output:

0.2345
7.9

So the program goes like this. While loop searches for occurrence of , pos_end will store first occurrence of ,, str2 will be a substring, pos_begin will go to one next to pos_end. First iteration will run fine.

In the next iteration, pos_end will be very large value and I am not sure what pos_end- pos_begin will be. Same goes with pos_begin (though it will be unused). Is making some checks, such as

if (pos_end == std::string::npos)
        pos_end = str1.length();

a way to go?

The program works on though (g++ -Wall -Wextra prog.cpp -o prog -std=c++11). Is this approach correct?

vcx34178
  • 961
  • 1
  • 8
  • 15
  • 1
    What about `std::istringstream` and `std::getline()` with an appropriate delimiter? –  Feb 15 '18 at 23:56
  • Can it work for variable length string reading from file? – vcx34178 Feb 15 '18 at 23:59
  • Looks reasonable. This might be [a question better asked at codereview](https://codereview.stackexchange.com/help/asking). Note that I linked to the how to ask page. This is on purpose. Make sure you comply with their rules before posting. – user4581301 Feb 16 '18 at 00:00
  • 1
    *"Can it work for variable length string reading from file?"* pick it up and move it into a function that takes a string and find out. – user4581301 Feb 16 '18 at 00:00
  • Can someone explain downvote? – vcx34178 Feb 16 '18 at 00:01
  • [*Returns a substring [pos, pos+count). If the requested substring extends past the end of the string, or if count == npos, the returned substring is [pos, size()).*](http://en.cppreference.com/w/cpp/string/basic_string/substr) So if `pos_end` is big it's not a problem. – super Feb 16 '18 at 00:04
  • 1
    You might want to look at the answers to [How can I read and parse CSV files in C++?](https://stackoverflow.com/questions/1120140/how-can-i-read-and-parse-csv-files-in-c) and just skip the step of reading lines from the file into the string. – Bo Persson Feb 16 '18 at 00:06
  • No clue on the downvote. I can see an off topic closevote, though. Question: It looks like the destination is `vector v1`. What do you plan to do with empty tokens? eg: ", 0.2345, 7.9 \n" – user4581301 Feb 16 '18 at 00:06
  • @vcx34178 [Yes, it works out with variable lenghts](http://coliru.stacked-crooked.com/a/5f0c3a84fca448b4). The down vote was for lack of research. There's really plenty information about that problem around. –  Feb 16 '18 at 00:13
  • @TheDude `The down vote was for lack of research.` I mentioned that in the first line of question. I'm not asking best approach, I want to know whether my approach is correct – vcx34178 Feb 16 '18 at 00:17
  • @vcx34178 Well, what's actually _"correct"_ may come up as being mostly _opinion based_. I just showed you a _"natural"_ alternative to go. –  Feb 16 '18 at 00:23
  • I would note that `operator>>` by default ignores white space and will read a double value from a text stream. – Martin York Feb 16 '18 at 00:26
  • You could search the internet for "StackOverflow C++ read file comma separated" or "StackOverflow C++ read file CSV" or "C++ read file CSV example". – Thomas Matthews Feb 16 '18 at 01:01
  • Sometimes, I think that other languages are a better choice when it comes to string manipulations. – Michael Dorgan Feb 16 '18 at 01:11
  • @MichaelDorgan Python can do it using `strip` and `split` functions in a single line. File reading has much shorter code there. But problem comes when bringing data to C++ – vcx34178 Feb 16 '18 at 01:16
  • @Michael Dorgan in fact it can be _very_ simple in C++ but everyone on SO starts to shout at you for using C code. I still prefer to write old style syntax analyzers, they look simpler and are more effective :P – Swift - Friday Pie Feb 16 '18 at 01:38

2 Answers2

2

I use ranges library in c++20 and implement like bellow:

#include <iostream>
#include <ranges>
#include <algorithm>
#include <vector>
    
auto join_character_in_each_subranges = [](auto &&rng) { 
      return std::string(&*rng.begin(), std::ranges::distance(rng)); };

auto trimming = std::ranges::views::filter([](auto character){ 
      return !std::isspace(character);});

int main()
{
    std::string myline = "  0.2345,  7.9  ";

    std::vector<double> line_list;

    for (std::string&& words : myline 
            | std::ranges::views::split(',') 
            | std::ranges::views::transform(join_character_in_each_subranges))
    {
        auto words_trimming = words | trimming;
        std::string clean_number;
        std::ranges::for_each(words_trimming, 
             [&](auto character){ clean_number += character;});

        line_list.push_back(atof(clean_number.c_str()));
    }
}

First, iterate on myline sentences and splits the view into subranges on the delimiter

 myline | std::ranges::views::split(',') 

get each subrange and append each character to each other and view into the std::string with transform function

std::transform applies the given function to a range and stores the result in another range.

 std::ranges::views::transform(join_character_in_each_subranges)

also, remove any prefix and suffix from view ranges

auto words_trimming = words | trimming;

and convert view ranges to std::string with

std::ranges::for_each(words_trimming, [&](auto character){ clean_number += character;});

finally, convert each clean_number to double and push_back into the list.

line_list.push_back(atof(clean_words.c_str()));
AmirSalar
  • 325
  • 2
  • 14
1

Your erase idiom may fail to compile on more modern compilers because isspace is overloaded. At certain point removing whitespaces using range-for might be more effective. Algorythm in question depends whether you need or not to store tokens and correct "syntax" errors in line and store or not empty token.

#include<iostream>
#include<string>
#include<list>
#include<algorithm>

typedef std::list<std::string> StrList;



void tokenize(const std::string& in, const std::string& delims, StrList& tokens)
{
    tokens.clear();

    std::string::size_type pos_begin  , pos_end  = 0;
    std::string input = in;

    input.erase(std::remove_if(input.begin(), 
                              input.end(),
                              [](auto x){return std::isspace(x);}),input.end());

    while ((pos_begin = input.find_first_not_of(delims,pos_end)) != std::string::npos)
    {
        pos_end = input.find_first_of(delims,pos_begin);
        if (pos_end == std::string::npos) pos_end = input.length();

        tokens.push_back( input.substr(pos_begin,pos_end-pos_begin) );
    }
}

int main()
{
  std::string str = ",\t,  0.2345,, , ,  7.9  \n";
  StrList vtrToken;

  tokenize( str, "," , vtrToken);

    int i = 1;
  for (auto &s : vtrToken)
      std::cout << i++ << ".) " << s << std::endl;

   return 0;
}

Output:

1.) 0.2345
2.) 7.9

This variant strips all empty token. Whether is right or not is unknown in your context, so there is no correct answer. If you have to check if string was correct, or if you have replace empty tokens with default values, you have to add additional checks

Swift - Friday Pie
  • 12,777
  • 2
  • 19
  • 42