It may be a silly basic question but as described in the title, I am wondering how PostgreSQL deals with performance when having millions of entries (with the possibility of reaching a billion entries).
To put it in a more concrete way, I want to store data (audio, photos and videos) in my database (I'm only storing their path, files are organised in the file system), but I have to decide wether I use a single table "data" to store all the different types of data, or multiple tables ("data_audio", "data_photos", "data_videos") to separate those types.
The reason why I am asking this question is that I have something like 95% of photos and 5% of audio and videos, and if I want to query my database for an audio entry, I don't want it to be slowed by all the photos entries (searching for a line among a thousand must be different than searching among a million). So I would like to know how PostgreSQL deals with this and if there exists some way to have the best optimisation.
I have read this topic that is really interesting and seems relevant: How does database indexing work?
Is it the way I should do?
Recap of the core stored informations I will have in my core tables:
1st option:
DATA TABLE (containing audio, photos and videos):
- id of type bigserial
- _timestamp of type timestamp
- path_file of type text
USERS TABLE:
- id of type serial
- forename of type varchar(255)
- surname of type varchar(255)
- birthdate of type date
- email_address of type varchar(255)
DATA USERS RELATION TABLE:
- id_data of type bigserial
- id_user of type serial
ACTIVITIES TABLE:
- id of type serial
- name of type varchar(255)
- description of type text
DATA ACTIVITIES RELATION TABLE:
- id_data of type bigserial
- id_activity of type serial
(SEARCH queries are mainly done on DATA._timestamp and ACTIVITIES.name fields after filtering data by USERS.id)
2nd option (only switching the previous DATA TABLE with the following three tables and keeping all the other tables):
- DATA_AUDIO TABLE
- DATA_PHOTOS TABLE
- DATA_VIDEOS TABLE
Additional question: Is it a good idea to have a database per user ? (in the storyline, to be able to query the database for data depends on whether you have the permission or not, and if you want to retrieve data from two different users, you have to ask the permission from both users, and the permission process is a process in its own right, it is not handled here, so let’s say that when you query the database, it will always be queries on the same user)
I hope I have been clear, thanks in advance for any help or advices!
Cyrille