0

If I had to read in a word from a document (one word at a time), and then pass that word into a function until I reach the end of the file, how would I do this?

What also must be kept in mind is that a word is any consecutive string of letters and the apostrophe ( so can't or rojas' is one word). Something like bad-day should be two separate words, and something like to-be-husband should be 3 separate words. I also need to ignore periods ., semi-colons ;, and pretty much anything that isn't part of a word. I have been reading it in using file >> s; and then removing stuff from the string but it has gotten very complicated. Is there a way to store into s only alphabet characters+apostrophes and stop at the end of a word (when a space occurs)?

while (!file.eof()) {

   string s;
   file >> s;  //this is how I am currently reading it it
   passToFunction(s);    
}
StacksAndParsing
  • 119
  • 1
  • 11
  • Why `eof()` inide a loop is wrong: https://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-considered-wrong – Galik May 07 '16 at 01:32
  • What is classified as a space is defined by the locale. You can set it up so that only characters and apostrophes are considered space. See http://stackoverflow.com/a/6154217/14065 – Martin York May 07 '16 at 03:06

2 Answers2

0

Yes, there is a way: simply write the code to do it. Read one character at a time, and collect the characters in the string, until you gets a non-alphabetic, non-apostrophe character. You've now read one word. Wait until you read the next character that's a letter or an apostrophe, and then you take it from the top.

One other thing:

while (!file.eof())

This is always a bug, and a wrong thing to do. Just thought I'd mention this. I suppose that fixing this is going to be your first order of business, before writing the rest of your code.

Community
  • 1
  • 1
Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148
  • Hey, thanks for the response. I fixed it using the method you described and edited it above. Does what I did above make sense? I'm kind of iffy about the end part `if (s.size() != 0) passToFunction(s);`. I did this because I don't want to pass an empty string in, if lets say the current character read was a number. – StacksAndParsing May 07 '16 at 01:43
  • The s.size() part is fine, but I believe that your code has a bug. If the last character in a file is a letter or an apostrophe (the file does not end with a newline, or any other character), it looks to me like you code will go into an infinite loop, and eventually run out of memory. – Sam Varshavchik May 07 '16 at 01:48
  • Also I tested it and it isn't accounting for spaces between words. EDIT: I changed it to file.get(c); and now it does – StacksAndParsing May 07 '16 at 01:51
  • Thanks I will take a look at that – StacksAndParsing May 07 '16 at 01:52
0

OnlyLetterNumAndApp facet for a stream

#include <locale>
#include <string>
#include <fstream>
#include <iostream>

// This facet treats letters/numbers and apostrophe as alpha
// Everything else is treated like a space.
//
// This makes reading words with operator>> very easy to sue
// when you want to ignore all the other characters.
class OnlyLetterNumAndApp: public std::ctype<char>
{
    public:
        typedef std::ctype<char>    base;
        typedef base::char_type     char_type;

        OnlyLetterNumAndApp(std::locale const& l)
            : base(table)
        {
            std::ctype<char> const&  defaultCType  = std::use_facet<std::ctype<char> >(l);

            for(int loop = 0;loop < 256;++loop) {
                table[loop] = (defaultCType.is(base::alnum, loop) || loop == '\'')
                     ? base::alpha
                     : base::space;
            }
        }
    private:
        base::mask  table[256];
};

Usage

int main()
{
     std::ifstream  file;
     file.imbue(std::locale(std::locale(), new OnlyLetterNumAndApp(std::locale())));
     file.open("test.txt");

     std::string word;
     while(file >> word) {
         std::cout << word << "\n";
     }
}

Test File

> cat test.txt
This is %%% a test djkhfdkjfd
try another $gh line's
bad-people.Do bad things

Result

> ./a.out
This
is
a
test
djkhfdkjfd
try
another
gh
line's
bad
people
Do
bad
things
Martin York
  • 257,169
  • 86
  • 333
  • 562