I'm writing a program for a personal project that wants to take a list of words from google books and their occurrences and put them into a vector with their occurrences attached so I can whittle the list down some. The list of words is formatted such that it has the word, a \t character, the number, a newline (\n), and it repeats. I don't have much experience with this type of programming, I was wondering how someone may parse a file that's formatted this way. Here's what I have so far:
#include <iostream>
#include <string>
#include <fstream>
#include <vector>
#define FILE_NAME
using namespace std;
// structure denoting a word occurence
// contains the string of the word and an integer representing its frequency
struct word_occ {
String word;
int occurence;
};
vector<word_occ> words_vector;
int main() {
/*
File is a .txt file that has the following format:
word1 #####
word2 #####
where word is the word from the english 1-grams from google books
and ##### is the number of occurences.
The word is separated from it's occurences by a tab (\t) and other words by a newline (\n).
All words are entirely lowercase, and all numbers are integers lower than 20,000,000
*/
ifstream all_words_list(FILE_NAME);
string line;
string line_word;
int line_occurence;
word_occ this_line;
while (getline(all_words_list, line)) {
// ... <-- what goes here?
this_line.word = line_word;
this_line.occurence = line_occurence;
words_vector.push_back(this_line);
}
}