Reading multi word from a file

Question

I have an input file. It contains a number of input values. If input for one object is like:

hss cscf "serving cscf" 32.5 ims 112.134

(Note: when an object's variable needs multi word string, I used "....", for single word string, it is without quotes)

How can I read it using ifstream? (I searched google but didn't find.)

I tried to read entire line using getline and but again got stuck when it came to find out whether its a single word or multi word input!

Please give some suggestions for this.

Ok, I thougth to read a line and then search char by char. If its '"', I know its a multi word. But I stuck when it comes to an integer or float. For char, you can use if(line[i]>='a'&&line[i]<='z') but how to go ahead when integer or float is the next value? — Jigyasa, Aug 09 '13 at 11:30

score 1 · Accepted Answer · answered Aug 09 '13 at 12:50

Hope this program helps you out

int main()
{
    fstream fstr;
    fstr.open("abc.txt",ios::in);
    string str;
    vector<string> Vec;
    while(getline(fstr,str))
    {
        char* pch;
        bool flag = false;
        string strTmp;
        int counter=0;
        pch = strtok (const_cast<char*>(str.c_str())," ");
        while (pch != NULL)
        {
            //this "is a" sample
            if(pch[0]=='\"')
            {
                flag = true;
                strTmp = strTmp + " " + string(pch).substr(1,strlen(pch)-1);
            }
            else
            {
                if(flag==true)
                {
                    if(pch[strlen(pch)-1]=='\"')
                    {
                        flag=false;
                        strTmp = strTmp + " " + string(pch).substr(0,strlen(pch)-1);
                        Vec.push_back(strTmp);
                    }
                    else
                    {
                        strTmp = strTmp + " " + pch;
                    }
                }
                else
                {
                    Vec.push_back(pch);
                }
            }
            pch = strtok(NULL," ");
        }

    }
    for(auto itr = Vec.begin();itr!=Vec.end();itr++)
        {
            cout<<*itr<<endl;
        }
        getchar();
}

Just providing a summary

Extact each line and get words using strtok with space as the delimiter.(Here, even the words in the quotes will be extracted as single words without treating them as multi-words.
For each word extracted, check whether it begins with a quote or not. If no, then add it to the vector else add it to a temp string and enable a flag as well.
Now, check for each word whether it ends with the quote or not and if the flag is set or not. If both satisfy, add the whole of temp string to the vecor or keep adding the words to the temp string.

Summarizing, this hols up the words in quotes in a temp string and directly adds single words to vector. When the quotes end, it adds the temp string to the vector as well.

score 1 · Answer 2 · edited May 23 '17 at 11:56

Since you're attempting to parse input from a file stream, and you're dealing with possibility of multiple words, if you wish to do so with generic support and one that is fully customizable - i.e. you want to parse any type of input, then you would require Regular Expressions.

You could use C++11's regex, but that isn't supported at the moment in gcc.

So, one solution is to use the boost C++ library which should work for the standards c++98, c++03 and c++0x:

#include <string>
#include <iostream>
#include <cstdlib>
#include <boost/regex.hpp>
using namespace std;

int main() {
  string text = "hss cscf \"serving\" 32.5 ims 112.134";

  boost::regex e("(\\w+)\\s(\\w+)\\s\"(\\w+\\s?)+\"\\s([0-9]+(\\.[0-9][0-9]?)?)\\s(\\w+)\\s([0-9]+(\\.[0-9][0-9]?)?)");

  boost::sregex_token_iterator iter(text.begin(), text.end(), e, 0);
  boost::sregex_token_iterator end;

  for(; iter != end; ++iter) {
    std::cout << *iter << std::endl;
  }

  return 0;
}

You can compile it using gcc (I used gcc-4.7.2) via the following:

g++ {filename} -std={language version} -I{your boost install location} -L{your boost library location} -o {output filename} {your boost library location}/libboost_regex.a

As for why the horridly long regex, if you wish to support full decimal parsing using a regex, then the above will work correctly for the following strings:

"hss cscf \"serving\" 32.5 ims 112.134"
"hss cscf \"serving more than one\" 32.5 ims 112.134"
"hss cscf \"serving\" 32 ims 112"

References:

Boost Regex: http://www.solarix.ru/for_developers/api/regex-en.html

Reading multi word from a file

2 Answers2