I know there have been questions about this in the past, such as here and here, but I haven't really seen anything addressing very large datasets.
So I have a large amount of data that is structured (the data contains information about streams across the United States, each stream has time series data and all of the streams are identified by a unique ID). Right now it is stored in NetCDF files, and to avoid having a huge file, these files are split up into smaller segments. Right now if we want to get access to the data from just one stream (because not many people want to see all 2.7 million stream simultaneously) we have to loop through all of the NetCDF files and extract data for just that one stream. We also have built a REST API (django-rest-framework) that does the same thing for anyone calling the endpoints.
I feel as if there is a better way to do this as far as efficiency. I have considered building a database with all of the data in it, but what concerns me is the fear that this might actually be slower than just looping through all of the files because putting all of that data into one place would use multiple terabytes of disk space. I was reading this article about MongoDB, and it seems that their products could help solve this problem. My question is, will storing all of this data in a database save time retrieving data, and how difficult will this be to implement?