0

I'm trying to find the longest word on a file in c++. I have the solution for that but the code is also considering the punctuation and I don't know how to avoid this.

This is the function "get_the_longest_word()":

string get_the_longest_word(const string &file_name){
int max=0;
string s,longest_word;
ifstream inputFile(file_name);

if(inputFile.is_open())
{
    while(inputFile>>s)
    {
        if(s.length()>max)
        {
            max=s.length();
            s.swap(longest_word);
        }
    }
    inputFile.close();
}else
    cout<<"Error while opening the file!!\n";

return longest_word;}

Thanks in advance for the help

Maximusrain
  • 33
  • 1
  • 7
  • 2
    Has it occured to you to make no changes to the shown code, read a whitespace delimited sequence of characters, and then simply delete all punctuation characters from each string? – Sam Varshavchik Nov 28 '21 at 16:10
  • So basically this is the output [link](https://gyazo.com/276f96e5daaf81b0f05676f92b344a32) and works fine. My point is i want to remove specific punctuation like commas, semicolons and colons. I tried to use `getline(inputFile, s, ';')` but off course this takes a whole line from ; to other ; – Maximusrain Nov 28 '21 at 16:17
  • 1
    You would do better if you read an entire line into a string and then "parse" it to pick out words. By "parse", I mean find each word as defined by your rules. By my understanding, your rules are contiguous characters that are not whitespace or punctuation. – Anon Mail Nov 28 '21 at 16:17
  • Read the file line by line into a std::string. Remove all punctuation characters. Then use `istringstream myStream(line);` and use the rest of your code on the myStream instead of inputFile. Related: [https://stackoverflow.com/a/19139085/487892](https://stackoverflow.com/a/19139085/487892) – drescherjm Nov 28 '21 at 16:37

1 Answers1

1

In c++ we have since long a good method to specify patterns of characters, that form a word. The std::regex. It is very easy to use and very versatile.

A word, consisting of 1 or many alphanum characters can simply be defined as \w+. Nothing more needed. If you want other patterns, then this is also easy to create.

And for such programs like yours, there is also no complexity overhead or runtime issue with regexes. So, it should be used.

Additionally, we have a very nice iterator, with which we can iterate over such patterns in a std::string. The std::sregex_token_iterator. And this makes life really simple. With that, we can use many useful algorithms provided by C++.

For example std::maxelement which takes 2 iterators and then returns the max element in the given range. This is, what we need.

And then the whole program boils down to just a few simple statements.

Please see:

#include <iostream>
#include <fstream>
#include <string>
#include <iterator>
#include <regex>
#include <algorithm>

const std::regex re{ "\\w+" };

std::string getLongestWord(const std::string& fileName) {

    std::string result{};

    // Open the file and check, if it could be opened
    if (std::ifstream ifs{ fileName }; ifs) {

        // Read complete file into a string. Use range constructor of string
        std::string text(std::istreambuf_iterator<char>(ifs), {});

        // Get the longest word
        result = *std::max_element(std::sregex_token_iterator(text.begin(), text.end(), re), {}, [](const std::string& s1, const std::string& s2) {return s1.size() < s2.size(); });

    } // Error, file could not be opened
    else std::cerr << "\n*** Error. Could not open file '" << fileName << "'\n\n";
    
    return result;
}

int main() {
    std::cout << getLongestWord("text.txt") << '\n';
}
A M
  • 14,694
  • 5
  • 19
  • 44