0

i'm running into a bit of speed problems with my little project i'm working on, and I'm confused about indexing and sorting.

First off, my site is in flask (python), and i'm using pymongo to interact with the db. My db has been growing, and now i'm starting to run into performance issues.

My collection is 991.2 KB and when I pull it from into db, i've just been sorting it in the view function. I realize now I should probably be sorting it at the DB level, and probably creating an index, but i have a few really dumb questions about what I should be doing.

  1. Is it even worth creating a new sorted index? The DB is a wiki, so people add stuff to it not infrequently, often multiple times per day. Not sure how that affects having it indexed.

  2. How exactly do I create this index if i want to implement it? I know it should be something like this:

collection.create_index([('field_i_want_to_index', pymongo.TEXT)], name='search_index', default_language='english')

but i'm not sure when and how frequently I should run that line of code? Just once I assume?

Any advice would be helpful. I'm a bit out of my depth here even though I've been working with mongoDB for like 2 years now.

I have just been returning the standard index and sorting it in flask. It works fine, but as my database grows, it seems like an unnecessary step.

Here are two queries that are on my index page, I added the bottom one today:

@cache.memoize(timeout=CACHE_TIMEOUT)
def all_collection(collection_name):
    return list(db()[collection_name].find({}))

@cache.memoize(timeout=CACHE_TIMEOUT)
    def all_courses_alphabetical():
        return list( db()['course'].find({}, sort=[('name', 1)]) )
scoofy
  • 103
  • 8
  • It would be helpful to see the queries you use, and also perhaps how you use the query results in your program. – rickhg12hs Jan 11 '23 at 03:04
  • I've added my two main queries. It's really just a database of information about golf courses. – scoofy Jan 11 '23 at 03:37
  • 1
    Since you have a single key ascending sort, you could just use [`collection.create_index("name")`](https://pymongo.readthedocs.io/en/stable/api/pymongo/collection.html#pymongo.collection.Collection.create_index) (or maybe it's `db()["course"].create_index("name")` for you - I'm not sure how to specify your colection). You could check the `explain` to verify the index is used. – rickhg12hs Jan 11 '23 at 19:20
  • 1
    Okay, yea, i think i will do this, thank you. – scoofy Jan 11 '23 at 20:05

1 Answers1

0

Well as you mentioned that currently, your collection is 991.2 KB, creating an index is overhead, and you can sort the data at the application level, which is quite optimized already.

Creating an index comes with memory consumption and delays in CRUD operations because whenever CRUD operations are performed, the index needs to be modified. Although creating an index will give you the benefit, that the data you will get will always be sorted on the name in ascending order by default, but what you need to decide is that is it that necessary to create index.

FYI you can create index by running

collection.create_index([("name", 1)]) # if you want ascending order
collection.create_index([("name", -1)]) # if you want descending order