I want to create a simple inverted index. I have a file with with docIds and keywords that are in each document. So the first step is to try and read the file and tokenize the text file. I found a tokenize function online that was supposed to work and changed it a little. I want to tokenize each word after a blank space. My text file doesn't have any commas or periods. After tokenizing the text file the tokens are stored in a vector. So after running the tokenize
function I tried printing out the elements of the vector but nothing happened. Then I tried printing out the size of the vector and as a result I get 0
. Here is my code:
#include <iostream>
#include <fstream>
#include <string>
#include <sstream>
#include "functions.h"
#include "vector"
using namespace std;
int main()
{
string line;
vector<string> v;
ifstream myfile("test.txt");
if(myfile.is_open()){
while(getline(myfile,line)){
//cout << line << '\n';
tokenize(line, ' ', v);
}
myfile.close();
}
else cout << "Unable to open file";
cout << v.size() << '\n';
return 0;
}
and here is my tokenize function:
using namespace std;
void tokenize(string s, char c, vector<string> v) {
string::size_type i = 0;
string::size_type j = s.find(c);
while (j != string::npos) {
v.push_back(s.substr(i, j-i));
i = ++j;
j = s.find(c, j);
if (j == string::npos)
v.push_back(s.substr(i, s.length()));
}
}
I can't use strtok
because I will use threads later in the program and I've read in a forum that strtok
doesn't work well with threads.