-1

Using C++ on Linux I am parsing(for words based on multiply delimiters) a big input that is provided via the stdin(no other way).

I read the stdinput using std::getline and then parse the line using the following pseudo code.

for (std::string single_line; std::getline(std::cin, single_line);)
{
    std::string single_word;

    for (auto single_charecter : single_line)
    {
        //do parsing based on a delimiter and
        // create a word 
    }
}

My question is regarding the efficiency of me using std::getline and then parsing the line one char at a time.

Cant this be improved using other function calls or maybe some approach that includes the use of threads?

pmg
  • 106,608
  • 13
  • 126
  • 198
p3t3
  • 95
  • 8
  • Does this answer your question? [How do I iterate over the words of a string?](https://stackoverflow.com/questions/236129/how-do-i-iterate-over-the-words-of-a-string) – zerocukor287 Sep 26 '21 at 13:28
  • @zerocukor287 This unfortunately doesn't. The anther specifically asked for a solution that is "please give precedence to elegance over efficiency" – p3t3 Sep 26 '21 at 13:43
  • 2
    Regarding performance there's no theory that beats the three rules: measure, measure and measure. – MatG Sep 26 '21 at 15:36
  • @MatG I coudn't agree more, hoped for some heuristic. – p3t3 Sep 26 '21 at 15:39
  • @p3t3 One of the few is minimize dynamic memory allocations, for the rest too many variables: compiler, compiler version, compiler flags, stdlib implementation, os, os version, hardware, ... – MatG Sep 26 '21 at 16:01

1 Answers1

0

From what I understand, you are trying to parse a text input and trying to create words based on delimiters. If this is the case, you don't need to parse one line at a time. You can directly use stringstream Class and getline Method to Parse String Using a Delimiter Try this -

#include <iostream>
#include <string>
#include <vector>
#include <sstream>

using std::cout; using std::cin;
using std::endl; using std::string;
using std::vector; using std::stringstream;

int main(){
    string text = "He said. The challenge Hector heard with joy, "
               "Then with his spear restrain'd the youth of Troy ";
    char del = ' ';
    vector<string> words{};

    stringstream sstream(text);
    string word;
    while (std::getline(sstream, word, del))
        words.push_back(word);

    for (const auto &str : words) {
        cout << str << endl;
    }

    return EXIT_SUCCESS;
}

OUTPUT - He said. The ... Troy

In this method, we are putting text string variable into a stringstream to operate on it with the getline method. getline extracts characters until the given char is found and stores the token in the string variable. Notice that this method can only be applied when a single character delimiter is needed.

SAIJAL
  • 162
  • 9
  • This is similar to my solution, please note that I already use the std::getline() and I use it on the stdin. Plus I have more then one delimiter(need to use all "white spaces" and punctuation. – p3t3 Sep 26 '21 at 13:47
  • @p3t3 Well, there's no general purpose ideal tokenizing and parsing in c++. You need to tailor that for your specific protocols and use cases. Hence your question is too broad given that narrow information. – πάντα ῥεῖ Sep 26 '21 at 13:51
  • My issue is not the parsing, when you parse you go over O(n) elements. I am asking regarding the reading of the stdin. For example reading one byte at a atime(using getchar()) vs the getline() or std::getline(). Or maybe there is a way to read a chunk of stdin instead of reading it by getline() which reads until it reaches a "\n" – p3t3 Sep 26 '21 at 13:59
  • 1
    @p3t3 In that case I would recommend multithreading. Read in one thread, pass the lines to other threads to do the manipulation. Alternatively you can use mmap. mmap gives you memory-like access to file so you can easly read in parallel. Check this out [Details on mmap](https://stackoverflow.com/questions/258091/when-should-i-use-mmap-for-file-access) – SAIJAL Sep 26 '21 at 15:18
  • 1
    @SAIJAL I agree using multithreading crossed my mind. Will look into mmap, I am not sure It will work with stdin. – p3t3 Sep 26 '21 at 15:41
  • @p3t3 _"Or maybe there is a way to read a chunk of stdin instead of reading it by getline() which reads until it reaches a "\n""_ Sure, you can use the `read()` function of the `std::istream` interface. – πάντα ῥεῖ Sep 26 '21 at 15:50