My current project is to take a dataset with aggregated data (i.e. time intervals like '2002-2006') and turn it into de-aggregated rows (i.e. one row per time - one row each for 2002, 2003, 2004, 2005, and 2006).
In order to create the new rows I have to parse the aggregate row, do some calculations, and compile (coalesce?) the new data into the new rows. I've completed the code to do all of that.
At issue currently is that I'm looking for a computationally fast way to append rows to the end of a dataframe and then check the number of rows in the dataframe. I have tried a couple of things, and below you can see some pseudocode do do the row appending.
mydf <- rbind(mydf, newRow1, newRow2)
if(nrow(mydf) %% 200 == 0){
print(nrow(mydf)
}
but that gets slow pretty fast.
I tried using a counter and semi-explicitly putting the data in the proper rows - like this.
mydf[2*counter - 1,] <- newRow1
mydf[2*counter,] <- newRow2
if(nrow(mydf) %% 200 == 0){
print(nrow(mydf)
}
but that slows down rather quickly also.
Is there a fast way to do this? I have about 200,000 rows, so even the simple example above would result in a 1,000,000 row output and a whole lot of row appendings. Is the slowness simply a function of the computing resources available to my computer? Would it go faster if I didn't ask the computer to print out the progress? Should I simply accept that this will take a really long time (and then just start it and let it run overnight)?