0

I'm building a web service that needs to consult a ~80mb file and perform a lookup (and some simple math of two of the columns) using user-submitted data and return results. The file looks like this:

1 236 comment
236 13255 comment
....

The incoming request includes a number, and I need to find which row contains the range (column 1, column 2) that hold that number, and return the comment.

My first thought was to try loading the file and reading it on every request. This slows down the connection significantly and leads to a 2-3 second page load.

My second idea was to load the file into memory once for each worker (using Flask and Gunicorn). This DDOS'ed my cloud instance and consumed way too much memory. I'm sure there's a more memory efficient way that doesn't require purchasing more resources.

I was thinking of potentially writing a small local web service that would respond to queries on the loopback interface. This shouldn't be too slow (web API request that doesn't leave the box), but adds some complexity.

Is there something simple I'm missing or have overlooked? Is there a design pattern I'm not familiar with that would help?

  • 4
    sounds like on app init your application could pre-compute all possible values, store them in a fast key-value store (redis?) and do a fast lookup at runtime. Otherwise try sorting your input and use binary search to find the right comment. – Adam Smith Mar 08 '21 at 02:38
  • @AdamSmith, redis is a pretty good idea actually. I thought about speeding up the search, but from my testing I'm losing most of my time to loading the file in memory, nor searching. – helloCode0135 Mar 08 '21 at 02:42
  • Yeah you definitely don't want to load an 80MB file every time you get a request. At the absolute minimum you should be loading the file, finding the solution, and memoizing that in redis and for my money you should do the whole thing for every potential valid input at app init. – Adam Smith Mar 08 '21 at 02:46
  • 1
    I'm not sure I understand the use case perfectly, but this sounds like a good fit to store the data in a local SQLite database. – Anon Coward Mar 08 '21 at 03:41
  • How is it possible to use "too much memory" when you load 80 MB into RAM? Even with 10 workers that's less than 1 GB. Serious question. – Jürgen Gmach Mar 08 '21 at 03:58
  • @AnonCoward, I shied away from SQL as the first two columns of the data can change frequently, meaning there's probably no easy way to index. I supposed I could purge and re-ingest the data when it changes... – helloCode0135 Mar 08 '21 at 13:16
  • @J.G., trying to stay within the free tier of AWS – helloCode0135 Mar 08 '21 at 13:17

0 Answers0