0

I'm storing data on a MongoDB Atlas cluster, but I noticed that when I execute a very simple query, my script will take more than 5 seconds to execute. Since I need the query to be as fast as possible, can someone help me find if I'm doing something wrong? I'm not having issues with my internet's speed, so that is not the problem.

The average record looks like this:

{"_id":{"$oid":"id"},"datetimeraw":"202007061535","rate":{"$numberInt":"950"},"amount":{"$numberDouble":"246.900944"},"datetime":{"$date":{"$numberLong":"1594049700000"}}}

And right now I have a total of 1000 records. The problem is that I know I'll probably reach 20/30k records. But if I'm having issues right now, I'm afraid that with that number of records it will be unbearable. Can the problem be caused by MongoDB Atlas itself?

Here is my code:

import numpy as np
import pandas as pd
from pymongo import MongoClient

client = MongoClient('mongodb+srv://user:pass@test-2liju.mongodb.net/test?retryWrites=true')
db = client.mydata

pData = pd.DataFrame(list(db.mydata.find()))

print(pData)
JayK23
  • 287
  • 1
  • 15
  • 49
  • try the same query result but fetch it using the _id of the record . if it's fast you're missing an index on the field you're querying your records on. – Cptmaxon Jul 06 '20 at 20:09
  • The problem is not that i need one of them, i need all of them so that i can use them on a pandas dataframe – JayK23 Jul 06 '20 at 20:19
  • then you're missing the point of having a database... you're suppose to leverage your DB power to run queries, not put everything in memory in a dataframe and select data from that. doing that will always be slower than selecting specific DB entries. if you want something fast like in memory I would suggest moving to something like Redis/memcache... not mongo – Cptmaxon Jul 06 '20 at 20:25
  • Using Redis can be an option, but when the amount of data becomes big, isn't MongoDB better than Redis, due to the fact that Mongo is schema-less? – JayK23 Jul 06 '20 at 20:38
  • Redis list allow you to put whatever objects you want in them. but again right tool for right job, I believe you just need to make your queries take only a slice of the data – Cptmaxon Jul 06 '20 at 20:40
  • What is 5 seconds a measure of? Time to execute the entire program? – D. SM Jul 06 '20 at 20:55
  • Yes exactly, that's what it takes – JayK23 Jul 06 '20 at 22:23

1 Answers1

1

This script runs (the data set is a read-only public data set so you can try this yourself) in less than 1 second:

import pymongo
from datetime import datetime
c=pymongo.MongoClient(host="mongodb+srv://readonly:readonly@demodata.rgl39.mongodb.net/demo?retryWrites=true&w=majority")

db=c["demo"]
zipcodes=db["zipcodes"]
start=datetime.utcnow()
l=list(zipcodes.find())
end=datetime.utcnow()
print(f"Duration: {end-start}")
print(f"Docs count: {len(l)}")

On my laptop is retrieves 29353 docs in less than a second. Can you try it and see if you are still seeing 6 second delays. Are you running on free tier? This dataset is running on an M10.

Joe Drumgoole
  • 1,268
  • 9
  • 9
  • There is no way for me to make the query more selective, i need to store that data and use it; what i'm doing is basically fetching data from a script and then retrieving it from another script. I did not think that 1000 json record would be too many to process from client side – JayK23 Jul 06 '20 at 20:29
  • How about storing data in arrays? I would store basically more records in a single document, like this {'datetime': '2020-7-6', 'data':[]}. Would it make it faster? – JayK23 Jul 06 '20 at 20:34
  • Does it take 5 seconds every time? I would expect the first fetch to be slow and subsequent fetches to be quicker? Can you move the processing node closer to the database? e.g. by placing an ec2 node in the same region as the db? – Joe Drumgoole Jul 06 '20 at 23:16
  • 1
    Added an example that processes 29k records in less than 1sec. See above. – Joe Drumgoole Jul 06 '20 at 23:33
  • So now i'm not getting that time to execute the query. I'm still trying to figure if the problem was with my wifi but now it works without any problem in less than 1 sec – JayK23 Jul 07 '20 at 08:38