1

I am using the following code for splitting of each word into a Token per line. My problem lies here: I want a continuous update on my number of tokens in the file. The contents of the file are:

Student details:
Highlander 141A Section-A.
Single 450988012 SA

Program:

#include <iostream>
using std::cout;
using std::endl;

#include <fstream>
using std::ifstream;

#include <cstring>

const int MAX_CHARS_PER_LINE = 512;
const int MAX_TOKENS_PER_LINE = 20;
const char* const DELIMITER = " ";

int main()
{
  // create a file-reading object
  ifstream fin;
  fin.open("data.txt"); // open a file
  if (!fin.good()) 
    return 1; // exit if file not found

  // read each line of the file
  while (!fin.eof())
  {
    // read an entire line into memory
    char buf[MAX_CHARS_PER_LINE];
    fin.getline(buf, MAX_CHARS_PER_LINE);

    // parse the line into blank-delimited tokens
    int n = 0; // a for-loop index

    // array to store memory addresses of the tokens in buf
    const char* token[MAX_TOKENS_PER_LINE] = {}; // initialize to 0

    // parse the line
    token[0] = strtok(buf, DELIMITER); // first token
    if (token[0]) // zero if line is blank
    {
      for (n = 1; n < MAX_TOKENS_PER_LINE; n++)
      {
        token[n] = strtok(0, DELIMITER); // subsequent tokens
        if (!token[n]) break; // no more tokens
      }
    }

    // process (print) the tokens
    for (int i = 0; i < n; i++) // n = #of tokens
      cout << "Token[" << i << "] = " << token[i] << endl;
      cout << endl;
  }
}

Output:

Token[0] = Student
Token[1] = details:

Token[0] = Highlander
Token[1] = 141A
Token[2] = Section-A.

Token[0] = Single
Token[1] = 450988012
Token[2] = SA

Expected:

Token[0] = Student
Token[1] = details:

Token[2] = Highlander
Token[3] = 141A
Token[4] = Section-A.

Token[5] = Single
Token[6] = 450988012
Token[7] = SA

So I want it to be incremented so that I could easily identify the value by its variable name. Thanks in advance...

user2754070
  • 509
  • 1
  • 7
  • 16
  • 2
    I'm just curious, but where are people finding this junk. There's no case (even in C) where `strtok` is an appropriate solution, and there's almost no case in C++ where you should be using the member `getline`, rather than reading into an `std::string`. And of course, `!fin.eof()` as a loop condition is wrong as well. – James Kanze Sep 30 '13 at 14:40
  • `strtok(0, DELIMITER);` is not valid, and should be generating a warning. Strtok's first parameter is a `char*`, and you have passed an `int`. – abelenky Sep 30 '13 at 14:41
  • http://www.boost.org/doc/libs/1_54_0/libs/tokenizer/tokenizer.htm ? – Jaffa Sep 30 '13 at 14:46
  • @JamesKanze Because it seems like the obvious way and C++ tutorials are notoriously bad. – Neil Kirk Sep 30 '13 at 14:46
  • 1
    @NeilKirk The _first_ thing you need to learn when learning C++ is that nothing is obvious. But why are so many tutorials so bad? You'd think that word would get around after a while, people would stop linking to them, and they'd stop showing up in Google. – James Kanze Sep 30 '13 at 14:50
  • 2
    @andre If by "more effective", you mean correct, or "that actually work", then I agree. The issue isn't effectiveness here, it is correctness. – James Kanze Sep 30 '13 at 14:51
  • possible duplicate of [How do I tokenize a string in C++?](http://stackoverflow.com/questions/53849/how-do-i-tokenize-a-string-in-c) – Ferruccio Sep 30 '13 at 16:32

2 Answers2

2

What's wrong with the standard, idiomatic solution:

std::string line;
while ( std::getline( fin, line ) ) {
    std::istringstream parser( line );
    int i = 0;
    std::string token;
    while ( parser >> token ) {
        std::cout << "Token[" << i << "] = " << token << std::endl;
        ++ i;
    }
}

Obviously, in real life, you'll want to do more than just output each token, and you'll want more complicated parsing. But anytime you're doing line oriented input, the above is the model you should be using (probably keeping track of the line number as well, for error messages).

It's probably worth pointing out that in this case, an even better solution would be to use boost::split in the outer loop, to get a vector of tokens.

James Kanze
  • 150,581
  • 18
  • 184
  • 329
  • You should move `int i = 0;` before the wile loop. Otherwise you won't have the expected output. – Olaf Dietsche Sep 30 '13 at 15:10
  • @OlafDietsche The `int i = 0;` _is_ before the while loop. (Look at his sample output to see what he wants.) – James Kanze Sep 30 '13 at 16:33
  • Sorry, I meant to move it before the first while loop. The output labeled "Output:" is what he gets and the output "Expected:" is what he wants. At least, that's what I understand. – Olaf Dietsche Sep 30 '13 at 19:51
  • @OlafDietsche Yes. It was I who misread his question. Yes, the variable (and its initialization) does belong before the first loop. (And in this case, there's no reason to use the nested loops, unless you want to keep track of the line number for error messages. Or use `boost::split`, which is really more appropriate in this case.) – James Kanze Oct 01 '13 at 07:57
0

I would just let iostream do the splitting

std::vector<std::string> token;
std::string s;
while (fin >> s)
    token.push_back(s);

Then you can output the whole array at once with proper indexes.

for (int i = 0; i < token.size(); ++i)
    cout << "Token[" << i << "] = " << token[i] << endl;

Update:

You can even omit the vector altogether and output the tokens as you read them from the input strieam

std::string s;
for (int i = 0; fin >> s; ++i)
    std::cout << "Token[" << i << "] = " << token[i] << std::endl;
Olaf Dietsche
  • 72,253
  • 8
  • 102
  • 198
  • 2
    What's with the `!fin.eof()`? That's never an appropriate loop condition. – James Kanze Sep 30 '13 at 14:40
  • See here: http://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-considered-wrong for a discussion of what's wrong with `!fin.eof()`. – us2012 Sep 30 '13 at 14:45
  • @JamesKanze, us2012 You're both right. But if OP insists on doing it that way, he can achieve his objective with a separate output variable. – Olaf Dietsche Sep 30 '13 at 14:53
  • @user2754070 What do you mean with it breaks at line[2]? – Olaf Dietsche Sep 30 '13 at 14:59
  • @OlafDietsche If the OP insists on using `fin.eof()`, his code will never work. And if he insists on using `strtok`, it will be excessively fragile, and unmaintainable. You're first solution is fine, at least if he doesn't need to keep the lines separate; there's no point in trying to pretend that the alternatives he seems to favor are acceptable. – James Kanze Sep 30 '13 at 15:00
  • @OlafDietsche My mistake, I was thinking this was comma-delimited for some reason, which would push the entire line into the vector (instead of the pieces of the line). – Zac Howland Sep 30 '13 at 15:11
  • Is there any way I can read these `token`s independently anywhere in my program? like if suppose I want to access `token[20]` value, is it possible? Please let me know... – user2754070 Oct 01 '13 at 07:11
  • @user2754070 If you keep them in a vector, you can access them anywhere you have access to the vector. – James Kanze Oct 01 '13 at 07:59
  • @JamesKanze Thanks! you mean like `vector token_v[k]=token[i]` ? – user2754070 Oct 01 '13 at 08:29
  • @user2754070 I'm not sure what you're trying to do with your last statement, but it isn't legal. – James Kanze Oct 01 '13 at 08:34
  • true, I got lots of errors!, I want to access `token[i]` anywhere in my program, I don't know whether I could save token[i] in a vector, yes I cannot. – user2754070 Oct 01 '13 at 08:42
  • @user2754070 When you use the first example and read the whole file into the `token` vector, you can access the tokens later in your program, as long as the vector exists. – Olaf Dietsche Oct 01 '13 at 10:16