0

So we basically want to read a text file consisting of some different segments to our program:

the structure in the program is a cache with: pair data> >

the structure in the file is (were key is used as both a key and a delimiter between segments)

key
headerKey : headerValue
headerKey : headerValue
......................
headerKey : headerValue
key
data
data
...
data
key

We have been trying to read this using the following, but it doesnt read the date format (RFC1123). we only get the dates in headerValues as "08 Gmt" or similar "XX gmt". What is wrong in our reading algorithm, below is that we are using : as a delimiter but it appears in the date format in different meaning, i.e. segmenting the time:

    try{

                // Create stream
                ifstream ifs(this->cacheFile.c_str(), ios::binary);

                // Read file to cache if stream is good
                if(ifs.good()){
                    while (! ifs.eof() ){
                        map<string,string> headerPairs;
                        string tmp;
                        string key;
                        string data;

                        getline(ifs, tmp);
                        while(tmp.empty()){
                            getline(ifs, tmp);
                            cout << "Empty line..." << "\n";
                            if(ifs.eof()){
                                cout << "End of File.."<< "\n";
                                break;
                            }
                        }

                        //After empty lines get "Key"
                        key = tmp;
                        getline(ifs, tmp);

                        //Get segment of header pairs
                        while(tmp != key){
                            StringTokenizer headerPair(tmp, ":", StringTokenizer::TOK_TRIM);
                            //StringTokenizer::Iterator it = headerPair.begin();
                            std::cout << *(headerPair.begin()) <<": " << *(headerPair.end()-1)<< std::endl;
                            string headerKey = *(headerPair.begin());
                            string headerValue = *(headerPair.end()-1);

                            headerPairs.insert(make_pair(headerKey, headerValue));
                            getline(ifs, tmp);
                        }

                        cout << "Added " << headerPairs.size() << " header pairs from cache" << "\n";
                        //tmp equals Key

                        while(tmp!=key){
                            getline(ifs, tmp);
                            cout << "Searching for header->data delimiter" << "\n";
                        }
                        cout << "Found header->data delimiter" << "\n";

                        //Get segment of data!
                        getline(ifs, tmp);
                        while(tmp != key){ 
                            data+=tmp;
                            getline(ifs, tmp);
                        }

                        cout << "DATA: " << data << "\n";
                        cout << "Ending delimiter:" << tmp << "\n";

                        this->add(key,make_pair(headerPairs, data));
                        cout << "Added: " << key << " to memory-cache" << endl;

                    }
                    ifs.close();
                }

            }
            catch (Exception &ex){
                cerr << ex.displayText() << endl;
            }

Please suggest a better way of getting the date string:

 DateTime now : Mon, 29 Apr 2013 08:15:57 GMT
 DateRetrieved from file: 57 GMT

In short: The problem is that we are using a : as a delimiter for the headers, i would like suggestions for another delimiter sign that is failsafe, i.e. it wont be found in the HTTP 1.0 or 1.1 Headers.

David Karlsson
  • 9,396
  • 9
  • 58
  • 103

1 Answers1

3

You can't find a failsafe delimiter as someone could always potentially use this parameter in the data.

However, the way to go is to escape any occurence of the delimiter in the data before inserting it. Here is how CSV does it:

"Date","Pupil","Grade"
"25 May","Bloggs, Fred","C"
"25 May","Doe, Jane","B"
"15 July","Bloggs, Fred","A"
"15 April","Muniz, Alvin ""Hank""","A"

(notice the double "" when the double quote is in the data and needs to be escaped)

Even if this method of doubling the character is commonly used, the most popular way to escape your delimiter is to add a backslash '\' before the character.

If you want to learn more about this, you can check out the Wikipedia page dedicated to this.

Tristan Bourvon
  • 368
  • 2
  • 7
  • according to [this](http://stackoverflow.com/questions/4400678/http-header-should-use-what-character-encoding) the header want contain some signs. We decided to use this (✰) special character in the file to separate header✰value – David Karlsson Apr 29 '13 at 09:51
  • @DavidKarlsson But what if a user uses this character in the data field? This is still not failsafe. – Tristan Bourvon Apr 29 '13 at 10:25
  • The StringTokenizer is only used on the header segment, this segment is separated from the others using a multi-character key... Therefore the data segment is not subject to the split on that character //BR – David Karlsson Apr 29 '13 at 10:44