I tried to import data from mongodb to r using:
mongo.find.all(mongo, namespace, query=query,
fields= list('_id'= 0, 'entityEventName'= 1, context= 1, 'startTime'=1 ), data.frame= T)
The command works find for small data sets, but I want to import 1,000,000 documents.
Using system.time and adding limit= X to the command, I measure the time as a function of the data to import:
system.time(mongo.find.all(mongo, namespace, query=query ,
fields= list('_id'= 0, 'entityEventName'= 1, context= 1, 'startTime'=1 ),
limit= 10000, data.frame= T))
The results:
Data Size Time
1 0.02
100 0.29
1000 2.51
5000 16.47
10000 20.41
50000 193.36
100000 743.74
200000 2828.33
After plotting the data I believe that: Import Time = f(Data^2)
Time = -138.3643 + 0.0067807*Data Size + 6.773e-8*(Data Size-45762.6)^2
R^2 = 0.999997
- Am I correct?
- Is there a faster command?
Thanks!