2

I'm currently working on developing API with FastAPI.

The API is for getting certain amount of data on a specific date from MariaDB.

I implemented this using FastAPI Pagination, but the it took 9 seconds to get a response.

Would there be other ways to improve performance?

I'm attaching a little background below:


  • The current GET API URL

http://localhost:8080/images/{date}?{page=N}&{size=M} (e.g. http://localhost:8080/images/20211223?page=1&size=50)

  • Table information

The table has about 10,000,000 rows, and here are a few lines of aggregated data. There is no index in this table, as of now. If needed, I'd add it!

enter image description here

  • The current implementation
# get function part
@router.get('/{stnd_ymd}', response_model=Page[ResponseImage])
async def get_images(stnd_ymd: str):
   image_list = Images.get_all(stnd_ymd=stnd_ymd, cnts_ty_cd="CT_IMAGE")
   return paginate(image_list)

# about `get_all` function
def get_all(cls, session: Session = None, **kwargs):
    sess = next(db.session()) if not session else session
    query = sess.query(cls)
    for key, val in kwargs.items():
        col = getattr(cls, key)
        query = query.filter(col == val)
    result = query.all()
    if not session:
        sess.close()
    return result

If there is any other information needed, please tell me!

Thanks,

dhuhd
  • 41
  • 2
  • 4
  • You should first do more tests and break down the 9 seconds response. Which specific line/s and function/s take the longest time? Is it the creation of the session? Is it the query? Is it the conversion to your model? Is it the pagination? This is still quite vague. – Gino Mempin Jan 04 '22 at 01:24
  • @GinoMempin Thank you very much for your comment. I'll do more tests and add some information on the post. – dhuhd Jan 04 '22 at 02:14
  • It seems like you're retrieving _all 9 million rows_ from the database, then filtering it before returning the result to the client. That's not necessary; make your `get_all` function know about the offset and limit, and then use `query[offset:offset+limit]` to only retrieve the part of the result set that you need from the database. – MatsLindh Jan 04 '22 at 10:40
  • Hi @MatsLindh, thank you for your comment. It turned out that your comment was exactly true. My query tried to get all the results and did the pagination after then that. I changed the query as you suggested to do offset and limit stuff, and it reduced the running time a lot. Thank you very much for your help! – dhuhd Jan 07 '22 at 05:17

2 Answers2

1

As some people pointed out in the comments, the slow response time could be because of badly optimized query. For example in your get_all method when applying filters for query you always fetch all columns which might not be the best practice when fetching such big amount of data so try fetching only those columns that are needed for your frontend/client in specific view. Also I think that you need to write your own custom pagination done at the query level (which will be faster because it's done by database itself) because I have suspicion that fastapi-pagination only paginates already pre-fetched data. Example of pagination using SQLAlchemy:

offset = (page_number * items_count) - items_count
filters = [
    getattr(cls, column_name) == value
    for column_name, value in kwargs.items()
]
query = query.where(*filters)
result = query.offset(offset).limit(items_count).all()

Try using fastapi-profiler when looking for performance bottlenecks in your code. Here is simple configuration I used in one of my projects:

app.add_middleware(
   CProfileMiddleware,
   enable=True,
   print_each_request=True,
   strip_dirs=False,
   sort_by="cumtime"
)
devaerial
  • 2,069
  • 3
  • 19
  • 33
  • Hi @devaerial, thank you very much for your comment. As you suggested, I changed the query and it reduced the running time significantly. Thank you very much for your help! – dhuhd Jan 07 '22 at 05:13
1

Which ORM framework do you use? Tortoise? SQLAlchemy? Fastapi-pagination integrates with bunch of frameworks

Then you want to understand what SQL query is produced. For this, I would debug the request and dive deep into ORM framework code to find a place when SQL query is generated. Usually, simple step into debugging should work.

Some ORM frameworks allow you to just see SQL queries logged. Here is a question about SQLAlchemy query log.

After you found your query you want to test it out manually and see what you can improve. ANALYZE command can be handy to get information about used table indexes and stuff.

andnik
  • 2,405
  • 2
  • 22
  • 33
  • Hi @andnik, thank you very much for your comment. I used SQLAlchemy. Your link, especially the query log one helped me a lot to solve the issue. Thank you very much for your help! – dhuhd Jan 07 '22 at 05:15
  • My pleasure, @dhuhd, please consider upvoting my answer if it was helpful. On my previous project I implemented pagination myself, don't remember the details, but the existing one was not what I was looking for. – andnik Jan 07 '22 at 07:51