1

I want to create a simple inverted index. I have a file with with docIds and keywords that are in each document. So the first step is to try and read the file and tokenize the text file. I found a tokenize function online that was supposed to work and changed it a little. I want to tokenize each word after a blank space. My text file doesn't have any commas or periods. After tokenizing the text file the tokens are stored in a vector. So after running the tokenize function I tried printing out the elements of the vector but nothing happened. Then I tried printing out the size of the vector and as a result I get 0. Here is my code:

#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include "functions.h"
#include "vector"

using namespace std;

int main()
{
    string line;
    vector<string> v;
    ifstream myfile("test.txt");


    if(myfile.is_open()){
        while(getline(myfile,line)){
            //cout << line << '\n';
            tokenize(line, ' ', v);
         }

      myfile.close();
    }
    else cout << "Unable to open file";

    cout << v.size() << '\n';

    return 0;
}

and here is my tokenize function:

using namespace std;

void tokenize(string s, char c, vector<string> v) {
   string::size_type i = 0;
   string::size_type j = s.find(c);

   while (j != string::npos) {
      v.push_back(s.substr(i, j-i));
      i = ++j;
      j = s.find(c, j);

      if (j == string::npos)
         v.push_back(s.substr(i, s.length()));
   }
}

I can't use strtok because I will use threads later in the program and I've read in a forum that strtok doesn't work well with threads.

captain
  • 1,747
  • 5
  • 20
  • 32

1 Answers1

4

Why is my vector empty?

Because you are passing the vector by value:

void tokenize(string s, char c, vector<string> v) {

Change it to a reference:

void tokenize(string s, char c, vector<string>& v) {
R Sahu
  • 204,454
  • 14
  • 159
  • 270
  • Oh, thanks it works sort of, the first line doesn't get tokenized, the first line is just a number,the number of documents. The function I found online also had `string& s` instead of `string s` I put, is there a difference? – captain Feb 05 '15 at 18:35
  • Yes. Using `string s` makes a copy of the input string. Using `string& s`, uses a reference to the original string. If you don't mind the original string getting modified by `tokenize`, then using a reference is more efficient. – R Sahu Feb 05 '15 at 18:39