0

this is quite a primitive problem, so I guess the solution shouldn't be hard, but I didn't find a way how to do it simply, neither have I summarized it to actually find it in the internet.
So going to the question, I have a file of information like this:

1988 Godfather 3 33 42
1991 Dance with Wolves 3 35 43
1992 Silence of the lambs 3 33 44

And I have a requirement to put all the information in a data structure, so lets say it will be int year, string name and three more int types for numbers. But how do I know if the next thing I read is a number or not? I never know how long is the word.
Thank you in advance for anyone who took their time with such a primitive problem. :)
EDIT: Don't consider movies with numbers in their title.

Rywi
  • 143
  • 1
  • 12
  • 1
    `std::getline` and parse the string into the parts and/or use a better delimiter than a space. – crashmstr Mar 04 '14 at 18:36
  • If you have control over the file format, use something other than space as a delimiter. `,` or `|` would probably suffice. – Zac Howland Mar 04 '14 at 18:45

4 Answers4

2

You're going to have some major issues when you go to try to parse other movies, like, Free Willy 2.

You might try instead to treat it as a std::stringstream and rely on the last three chunks being the data you're looking for rather than generalizing with a Regular Expression.

Jeremy Villegas
  • 193
  • 1
  • 8
1

your best bet would be to use C++ regex

That would give you a more fine grained control over what you want to parse. examples:

year -> \d{4}
word -> \w+
number->\d+
DhruvPathak
  • 42,059
  • 16
  • 116
  • 175
  • I'm actually quite a newbie at C++, could I get a better explanation how that works? I think I understand that d stands for decimal, w for a letter(?), but how does this all work in a general context? – Rywi Mar 04 '14 at 18:52
  • 2
    This will be very difficult to get right when the movie title ends in a digit (e.g. "Ice Age 2"), or when the movie title is an actual number (e.g. "300"). – Zac Howland Mar 04 '14 at 18:56
  • @ZacHowland Ohh thanks for the correction. I did not pay attention to the fact that the text is movie titles, and can possibly contain numbers. – DhruvPathak Mar 05 '14 at 06:51
0

If you do not have control over the file format, you may want to do something along these lines (pseudo-process):

1) read in the line from the file
2) reverse the order of the "words" in the file
3) read in the 3 ints first
4) read in the rest of the stream as a string
4) reverse the "words" in the new string
5) read in the year
6) the remainder will be the movie title
Zac Howland
  • 15,777
  • 1
  • 26
  • 42
  • I really like the idea, may I ask you what method should I use to reverse it? – Rywi Mar 04 '14 at 19:06
  • @Rywi You would have to write that yourself ([there are other SO questions that do this](http://stackoverflow.com/questions/1009160/reverse-the-ordering-of-words-in-a-string)), but you could reuse it for steps 2 and 4. – Zac Howland Mar 04 '14 at 19:09
0

Read every field as a string and then convert the appropriate string to integers.

1)initially 
  1983 
  GodFather
  3
  33
  45 
  are all strings and stored in a vector of strings (vector<string>).

2)Then 1983(1st string is converted to integer using atoi) and last three strings are also converted to integers. Rest of the strings constitute the movie_name

Following code has been written under the assumption that input file has already been validated for the format.

// open the input file for reading
ifstream ifile(argv[1]);
string input_str;

//Read each line        
while(getline(ifile,input_str)) {
stringstream sstr(input_str);
vector<string> strs;
string str;
while(sstr>>str)
    strs.push_back(str);
    //use the vector of strings to initialize the variables
    // year, movie name and last three integers
            unsigned int num_of_strs = strs.size();

            //first string is year
    int year = atoi(strs[0].c_str());

            //last three strings are numbers
    int one_num = atoi(strs[num_of_strs-3].c_str());
    int two_num = atoi(strs[num_of_strs-2].c_str());
    int three_num = atoi(strs[num_of_strs-1].c_str());
    //rest correspond to movie name
    string movie_name("");
    //append the strings to form the movie_name
            for(unsigned int i=1;i<num_of_strs-4;i++)
        movie_name+=(strs[i]+string(" "));
        movie_name+=strs[i];

IMHO Changing delimiters in the file from space to some other character like , or ; or : , will simplify the parsing significantly. For example , if later on the data specifications change and instead of only last three , either last three or last four can be integers then the code above will need major refactoring.

Ravi
  • 255
  • 3
  • 16