2

My Goal:

I am trying to write an application in C++ where the user can ask for a a certain weather parameter over a certain date range and this program will find that information from the internet and write it out to a text file. So the user can ask for something like high temperature for every day between August 2, 2009 and August 10, 2009. The application will then spit out a text file something like this:

Month,    Date,    Year,    High
8         2        2009     80.3
8         3        2009     76.9
...
8         10       2009     68.4

I already have getting the webpages, parsing the HTML into meaningful values, and writing these values into a database (txt file) working. I also wrote a function

insert(std::iostream& database, Day day); //Day is a class I defined that contains all the weather information

that will find where this day belongs to stay in order, and insert it into the middle. I have tested this function and it works exactly like it should.

My Problem:

I am now trying to write a function that does this:

void updateDatabase(std::iostream& database, Day start, Day end)
{
    Day currentDay = start;
    while (currentDay.comesBefore(end))
    {
        if (currentDay.notInDatabase(database))
            insert(database, currentDay);
        currentDay = currentDay.nextDay();
    }
}

But unfortunately, the insert() function only works correctly if I call it once per program. If I try to call insert() twice in a row, (or three, or four, or five) only the last day will show up on my text file.

Here is the smallest possible amount of code that reproduces my problem but still runs.

#include <iostream>
#include <fstream>
#include <string>

const std::string FOLDER = "/Users/Jimmy/Desktop/WeatherApp/";
const std::string DATABASE_NAME = FOLDER + "database.txt";

class day
{
public:
    int date;
    int month;
    int year;
    bool comesBefore(int month, int date, int year);
    day(int month, int date, int year)
    {
        this->month = month;
        this->date = date;
        this->year = year;
    }
};

void writeToDatabase(std::iostream& file, day today, bool end = true);
void insertDay(std::iostream& file, day today);

int main()
{
    std::fstream database;

    database.open(DATABASE_NAME);
    if (database.fail())
    {
        std::cout << "Cannot find database.\n";
        exit(1);
    }

    day second(1, 2, 2000);
    insertDay(database, second);

    std::cout << "First day inserted. Press enter to insert second day.\n";
    std::cin.get();

    day third(1, 3, 2000);
    insertDay(database, third);

    std::cout << "Done!\n";
    return 0;
}

bool day::comesBefore(int month, int day, int year)
{
    if (this->year < year)
        return true;
    if (this->year > year)
        return false;
    //We can assume this->year == year.
    if (this->month < month)
        return true;
    if (this->month > month)
        return false;
    //We can also assume this->month == month
    return (this->date < day);
}

void writeToDatabase(std::iostream& file, day today, bool end)
{
    if (end) //Are we writing at the current cursor position or the end of the file?
        file.seekg(0, std::ios::end);
    file << today.month << '\t' << today.date << '\t' << today.year << '\n';
    return;
}

void insertDay(std::iostream& file, day today)
{
    //Clear flags, and set cursor at beggining
    file.clear();
    file.seekg(0, std::ios::beg);

    int date, month, year;
    long long positionToInsert = 0;

    while (!file.eof())
    {
        file >> month >> date >> year;
        //std::cout << month << date << year << '\n';
        if (today.comesBefore(month, date, year))
        {
            //We found the first day that comes after the day we are inserting
            //Now read backwards until we hit a newline character
            file.unget();
            char c = '\0';
            while (c != '\n')
            {
                file.unget();
                c = file.get();
                file.unget();
            }
            positionToInsert = file.tellg();
            break;
        }
    }

    if (file.eof())
    {
        //We hit the end of the file. The day we are inserting is after every day we have. Write at the end.
        file.clear();
        writeToDatabase(file, today);
        return;
    }

    file.clear();
    file.seekg(0, std::ios::beg);
    std::fstream tempFile;
    std::string tempFileName = FOLDER + "tempfile.txt";
    std::string terminalCommand = "> " + tempFileName;

    //Send the command "> /Users/Jimmy/Desktop/WeatherApp/tempfile.txt" to the terminal.
    //This will empty the file if it exists, and create it if it does not.
    system(terminalCommand.c_str());

    tempFile.open(tempFileName);
    if (tempFile.fail())
    {
        std::cout << "Failure!\n";
        exit(1);
    }

    int cursorPos = 0;
    while (cursorPos++ < positionToInsert)
    {
        char c = file.get();
        tempFile.put(c);
    }
    tempFile.put('\n'); //To keep the alignment right.

    writeToDatabase(tempFile, today, false);
    file.get();

    char c = file.get();
    while (!file.eof())
    {
        tempFile.put(c);
        c = file.get();
    }

    terminalCommand = "mv " + tempFileName + " " + DATABASE_NAME;
    //Sends the command "mv <tempFileName> <databaseName>" to the terminal.
    //This command will move the contents of the first file (tempfile) into the second file (database)
    //and then delete the old first file (tempfile)
    system(terminalCommand.c_str());


    return;
}

I added that the cin.get() part in main so I could look at my database before and after each insert() call. Here is the database before compiling/running:

1   1   2000
1   4   2000

Here is the database before hitting enter/moving on past cin.get():

1   1   2000
1   2   2000
1   4   2000 

And here is the database after I move on past cin.get() and my program exits:

1   1   2000
1   3   2000
1   4   2000

I have changed the dates that are being inserted, how many dates are being inserted, how far apart the two dates are and the initial size of the database before running the program, but I always get the same result. After every call to insert(), the database acts as if that was the only call to insert that was ever made. However, if I run the program many times, the text file continues to grow. I only get this problem if I try to call insert more than once per compiling/running. So If I were to run this program 5 times:

int main()
{
    std::fstream database;

    database.open(DATABASE_NAME);
    if (database.fail())
    {
        std::cout << "Cannot find database.\n";
        exit(1);
    }

    day today(1, 2, 2000);
    insertDay(database, today);

    std::cout << "Done!\n";
    return 0;
}

My database would end up looking like this:

1   1   2000
1   2   2000
1   2   2000
1   2   2000
1   2   2000
1   2   2000
1   4   2000

I suspect it's a problem either with fstream.clear(), fstream.seekg() and fstream.eof(), or maybe about closing/reopening the file. But nothing that I have done to fix it has helped.

Also, it is worth noting that this will not run on a windows computer. It should be fine on linux, but I have only tested it on Mac, so I could be wrong. It uses bash for creating/deleting/renaming/moving files.

Any help (even just a nudge in the right direction) is HUGELY appreciated. I've been pulling my hair out over this one for a while. Also, I know SO dislikes code dumps, so I have massively simplified the problem. My full program is 700+ lines and 10 different files, and this is about as short as I could get it while still getting the idea across.

DJMcMayhem
  • 7,285
  • 4
  • 41
  • 61

1 Answers1

2

The problem you have here has to do with the way you handle the file: when you mv a file, the old file is not overwritten per se; instead it is unlinked ("deleted") and a new file is created in is place.

On Unix-like operating systems, you can still retain a handle to an unlinked file: it's just not accessible using a path. This is why on Unix it's perfectly okay to delete a file that's still open, unlike on Windows: the file still exists after you have unlinked it, at least until all the file descriptors have been closed. This means that database has not changed at all: it is still pointing to your old file and contains the same contents.

A simple workaround would be to close and reopen the file. (From a practical perspective, it's probably much better to just use a readily available solution such as Sqlite.)

Community
  • 1
  • 1
Rufflewind
  • 8,545
  • 2
  • 35
  • 55
  • Oh, so the fstream object in memory is unchanged, but the file in my directory was deleted? – DJMcMayhem Dec 20 '14 at 06:08
  • It is still pointing to the same old file. The file still exists on your hard disk, despite being "deleted" (i.e. no longer accessible using a path). The file is truly eliminated only when *all file handles to it are closed*. – Rufflewind Dec 20 '14 at 06:10
  • Okay. Is there a more efficient way to go about this? If I need to update 30+ dates it seems like it could take a while if it has to download a file, parse it, insert the data into the database, send bash commands, and close and reopen the file. – DJMcMayhem Dec 20 '14 at 06:13
  • Well, you can cache it and do it in batches instead of one date at a time, but it's likely that the download is going to be a far greater overhead than reopening a file. (Or use a library to do this: writing a reliable and fast database is by no means an easy task. Not to mention, your insertion algorithm is linear in cost, so inserting dates will be quadratic in cost, which is less-than-ideal.) – Rufflewind Dec 20 '14 at 06:17