0

I am using a Lenovo Laptop, CPU @ 2.20GHz, 7.86 GB of usable memory, 64-bit Windows 8. I am analyzing in R studio datasets usually with over 250,000 rows. The function reads a table (called ppt) and goes through all the rows of this table and take decisions through the statements in the body of the while loop:

while (i < (length(ppt[,1]) - 192)) {
        print(i)
        .
        .
        .
        .
        i = i+1
    }

After some hours running the code and not finishing it, I inserted the print(i) in the function to trace it. For a table having 294991 rows (size = 6.17MB), i goes from 20 to 270781 in about 14 seconds, then it stops and does, and no more i is printed which I assume the code is not analyzing anymore but still running. In fact I would have to hit STOP in order to continue working with R studio.

Then I deleted some rows of this dataset making it to have 147635 rows. Same thing, but now i goes from 20 to 147400 (in about 8 seconds) and seems to be still working and printing no i's.

I still made the data shorter, having 37000 rows. Now, it goes all the way up to the last and finishes running.

Sample data:

> ppt<- read.csv("Flow_pptJoint - Copy - Copy.csv")
> ppt[60:70,]
              date precip flow NA.
60 12/1/2003 14:45     NA   85  NA
61 12/1/2003 15:00     NA   85  NA
62 12/1/2003 15:15     NA   85  NA
63 12/1/2003 15:30     NA   85  NA
64 12/1/2003 15:45     NA   85  NA
65 12/1/2003 16:00     NA   83  NA
66 12/1/2003 16:15     NA   83  NA
67 12/1/2003 16:30     NA   83  NA
68 12/1/2003 16:45     NA   83  NA
69 12/1/2003 17:00     NA   83  NA
70 12/1/2003 17:15     NA   83  NA

I was wondering if that should be a memory problem, and if yes how I could approach the issue.

Community
  • 1
  • 1
Alan Alves
  • 61
  • 1
  • 2
  • 9
  • 2
    Well, use the task manager to check memory use. However, that `while` loop is a sure sign of badly written R code, which would be expected to be slow. R is not C and even there you would use a `for` loop (but you probably shouldn't use an explicit R loop at all). Show your code, [provide some example data](http://stackoverflow.com/a/5963610/1412059) and people can show you better alternatives. – Roland Jul 11 '14 at 07:20
  • Thank you Roland for your response. I did not show my code, because due to several nested if statements, that would be so difficult for me to explain it in an understandable way (since I wrote this to accomplish a very specific task in hydrology field). While I do agree with you that must be better ways to write the code accomplish the same task, I am pretty new in programming (no more than 5 months) and this was the only way I found to change the index of the loop in the body of the loop. Thank you for your response. – Alan Alves Jul 11 '14 at 18:18

1 Answers1

0

Given your hardware it seems unlikely that you are facing a memory issue (by the way, it is generally expected to give columns as well as rows in order to give a more accurate idea of the size of the data). Also, memory issues generally end with an "Error: Cannot allocate memory" or "Bad alloc" or something of the sorts.

This seems rather like an endless loop. Check your while statements and the specific rows of data that they get stuck on. An option to do this is with a browser statement in the iteration of the loop that gets stuck.

Also, in general loops are quite ineffective in R. When possible, consider other approaches (maybe ddply with a custom function that computes the statements?).

  • In fact a repeat loop within the while loop would be endless every time it would approach the end of the data. I figured that out with a browser statement as said. Thank you for suggesting plyr package. Although I do not think it is useful for my current code it might possibly be for future solutions. Thank you! – Alan Alves Jul 11 '14 at 18:23