I am currently strugling with a sort of large (30M rows, 14+ columns) dataset in r using data.table
package on my laptop with 8GB ram running 64-bit Win 10.
I have been hitting memory limits all day long, getting the error that R can't allocate slightly over 200MB for a vector. When I look into the Windows Task Manager, I can see, that 2-3GB of RAM are currently at use by R (or about 65% of total, including the system and some other processes). When I run the R gc()
command, I get the output, that about 7800Mb out of 8012Mb is currently at use.
When I run the gc()
command for second time, I can see that there was no change in the used memory thanks to previous execution of the gc
.
When processing the data (i.e executing some data.table
command), the process uses pretty much all installed memory and writes a thing or two to disk.
What is the reason for the difference between gc()
output and what I see in task manager? Or to be more precise, why is the number in task manager lower?