C++ searching a line from a file for certain words and then inserting a word after those words

Question

Im very new to C++ and ive been struggling for quite a while trying to figure out how to do this problem. Basically, i need to read from a file and find all instances of an article ("a","A","an","aN","An","AN","the","The","tHe","thE","THe","tHE","ThE","THE")and then insert an adjective after that article. The adjective's capitalization must be based on the word originally in front of the article. For instance, if i found "a SHARK" i would need to make it "a HAPPY SHARK." Can anyone tell me what the best way to do this would be? So far I've scrapped a lot of ideas and this is what i have now, though i don't think i can do it this way:

#include <iostream>
#include <string>
#include <cctype>
#include <fstream>
#include <sstream>

using namespace std;

void
usage(char *progname, string msg){
    cerr << "Error: " << msg << endl;
    cerr << "Usage is: " << progname << " [filename]" << endl;
    cerr << " specifying filename reads from that file; no filename reads standard input" << endl;
}

int main(int argc, char *argv[])
{
    string adj;
    string file;
    string line;
    string articles[14] = {"a","A","an","aN","An","AN","the","The","tHe","thE","THe","tHE","ThE","THE"};
    ifstream rfile;
    cin >> adj;
    cin >> file;
    rfile.open(file.c_str());
    if(rfile.fail()){
        cerr << "Error while attempting to open the file." << endl;
        return 0;
    }
    while(rfile.good()){
        getline(rfile,line,'\n');
        istringstream iss(line);
        string word;
        while(iss >> word){
            for(int i = 0; i <= 14; i++){
                if(word == articles[i]){
                    cout << word + " " << endl;
                }else{
                    continue;
                }
            }
        }
        }
  }

Don't loop on `.good()` but on `getline()`. That's the stream-way of looping until end of file ! — Christophe, Feb 13 '15 at 20:06

jschultz410 · Accepted Answer · 2015-02-13T21:16:09.337

1

So far, pretty good, although if you need to handle an article at the end of a line, then you might be in trouble doing this line by line.

Anyway, ignoring that wrinkle for a second, after you've matched an article, then first you need to get the next word on which you need to base your capitalization. Then you need to create a new string version of your adjective that has the correct capitalization:

string adj_buf;  // big enough or dynamically allocate it based on adj

while(iss >> word){
    for(int i = 0; i <= 14; i++){
        if(word == articles[i]){
            cout << word + " ";
            iss >> word;  // TODO: check return value and handle no more words on this line
            adj_buf = adj;
            for (j = 0; j < word.size() && j < adj.size(); ++j)
                if (isupper(word[j]))
                    adj_buf[j] = toupper(adj[j]);
                else
                    adj_buf[j] = tolower(adj[j]);

            cout << adj_buf + " " + word;
            break;
        }
    }
}

Circling back to the wrinkle we ignored. You probably don't want to do this line by line and then token by token because handling this special case will be ugly in your control. Instead, you probably want to do it token by token in a single loop.

So, you need to write a helper function or class that operates on the file and can give you the next token. (There probably is exactly such a class already in the STL, I'm not sure.) Anyway, using your I/O it might look something like:

struct FileTokenizer
{
    FileTokenizer(string fileName) : rfile(fileName) {}

    bool getNextToken(string &token)
    {
        while (!(iss >> token))
        {
            string line;

            if (!rfile.getline(rfile, line, '\n'))
                return false;

            iss.reset(line);  // TODO: I don't know the actual call to reset it; look it up
        }

        return true;
    }

private:
    ifstream      rfile;
    istringstream iss;
};

And your main loop would then look like:

FileTokenizer tokenizer(file);

while (tokenizer.getNextToken(word))
{
    for(int i = 0; i <= 14; i++){
        if(word == articles[i]){
            cout << word + " ";

            if (!tokenizer.getNextToken(word))
                break; 

            adj_buf = adj;
            for (j = 0; j < word.size() && j < adj.size(); ++j)
                if (isupper(word[j]))
                    adj_buf[j] = toupper(adj[j]);
                else
                    adj_buf[j] = tolower(adj[j]);

            cout << adj_buf + " " + word;
            break;
        }
    }
}

You probably want to output the rest of the input too?

edited Feb 13 '15 at 21:16

answered Feb 13 '15 at 20:17

jschultz410

2,849
14
22

unfortunately i do need to handle an article at the end of a line.. and also if the file ends with an article its not supposed to have an adj inserted after it. is there a better way to search through the file then? – illusiate Feb 13 '15 at 20:26
Yeah, I just posted it :) – jschultz410 Feb 13 '15 at 20:38
what if i did something like this? rfile.open(file.c_str()); if(rfile.fail()){ cerr << "Error while attempting to open the file." << endl; return 0; } string nextToken; while (rfile >> nextToken) { //cout << nextToken << endl; for(int i = 0; i <= 14; i++){ if(nextToken == articles[i]){ //cout << nextToken + " " << endl; } } } rfile.close(); return 0; – illusiate Feb 13 '15 at 21:05
Another bit of tricky-ness to handle. What are you supposed to do if you have back to back articles: "a a SHARK"??? To handle that properly you might want to add a peekNextToken() to your helper class for extracting the next word when you find an article match. The underlying string stream probably already has a peek capability. I also added a "break" to the inner if just to skip needless comparisons after a match. – jschultz410 Feb 13 '15 at 21:06
if its "a a SHARK" it would need to be "a HAPPY a SHARK" – illusiate Feb 13 '15 at 21:08
Yes, if ifstream already does the necessary tokenization for you, then that can be your tokenizer class instead of rolling your own. I was merely imitating the way you were handling the I/O. – jschultz410 Feb 13 '15 at 21:10
"if its "a a SHARK" it would need to be "a HAPPY a SHARK"" That doesn't sound right to me. Based on the rules you said earlier I would think it would be "a happy a HAPPY SHARK". You also found a bug in my code. You need to initialize adj_buf with the entire contents of adj before you start and get rid of the nul termination. I'll update my code. – jschultz410 Feb 13 '15 at 21:11
Also, I assumed that the capitalization of the adjective had to be based character by character on the word following the article. If it is just based on the first character, then you don't need the transformation loop. Just output an all lower case or all upper case version of the adjective based on the first character of the word following the article. – jschultz410 Feb 13 '15 at 21:20
Okay so the "a a SHARK" is a special case we were told of, and we were told to do "a HAPPY a SHARK." Then the rest are like this: "a SHARK" - "a HAPPY SHARK" "a sHARK" - "a happy sHARK" "a Shark" - "a Happy Shark" "a shark" - "a happy shark" – illusiate Feb 13 '15 at 21:49
Ok, then get rid of the transformation loop and instead check which of the 3 cases you are in based on the first two characters of the word (watch out for single character words!) and output the proper form of the adjective. You probably just want to have the 3 proper forms of the adjectives set up in variables before you start processing. – jschultz410 Feb 13 '15 at 22:13

score 0 · Answer 2 · edited May 23 '17 at 11:49

First I propose you to use 3 auxiliary function to transform string cases. These will be usefull if you work a lot with text. Here they are based on <algorithm> but many other aproaches are possible:

string strtoupper(const string& s) {   // return the uppercase of the string
    string str = s; 
    std::transform(str.begin(), str.end(), str.begin(), ::toupper);
    return str; 
}
string strtolower(const string& s) {    // return the lowercase of the string
    string str = s;
    std::transform(str.begin(), str.end(), str.begin(), ::tolower);
    return str;
}
string strcapitalize (const string& s) {  // return the capitalisation (1 upper, rest lower) of the string
    string str = s;
    std::transform(str.begin(), str.end(), str.begin(), ::tolower);
    if (str.size() > 0)
        str[0] = toupper(str[0]); 
    return str;
}

Then a utility function to clone the capitalisation of a word : it sets the adjective to lowercase or uppercase or capitalizes it(1 upper+rest lower) copying the case of the refernce word. It's robust enough to handle empty words, and words wich are not alaphanumeric:

string clone_capitalisation(const string& a, const string& w) {
    if (w.size() == 0 || !isalpha(w[0]))  // empty or not a letter
        return a;                         //   => use adj as it is
    else {
        if (islower(w[0]))   // lowercase
            return strtolower(a);
        else return w.size() == 1 || isupper(w[1]) ? strtoupper(a) : strcapitalize(a);
    }
}

All these functions do not change the original strings !

Now to the main(): I don't like having to manually put all the possible combination of upper and lowercase of the articles, so I work only uppercase.

I don't like either to sequentially go through all possible articles for every word. If there would be many more articles, it would not be very performant ! So I prefer to use a <set> :

...
set<string> articles  { "A", "AN", "THE" };   // shorter isn't it ? 
...
while (getline(rfile, line)) {
    istringstream iss(line);
    string word;
    while (iss >> word) {     // loop 
        cout << word << " ";  // output the word in any case
        if (articles.find(strtoupper(word))!=articles.end()) {  // article found ?
            if (iss >> word) {  // then read the next word
                cout << clone_capitalisation(adj, word) << " " << word << " ";
            }
            else cout << word;  // if case there is no next word on the line...
        }
    }
    cout << endl; 
}

C++ searching a line from a file for certain words and then inserting a word after those words

2 Answers2

Linked