What is the fastest way to query an array of id's?

Question

I'm constructing a database for a project of mine. Currently, I have a table with images and a table with products.

product
 Column         |   Type    | 
----------------+-----------+
 id             | integer   |
 images         | integer[] | 
 display_image  | integer   | 
 ...

image
 Column         |   Type    | 
----------------+-----------+
 id             | integer   |
 data           | text      |

So the basic idea is to store the ids of all images that belong to one product in the product.images field. It will be a use case of the database to find the corresponding product to a specific image. Additionally, one image can belong to multiple products.

Now I'm not sure if this is the fastest way to get results if I query which products belong to an image and the database will hold millions of products and images. Would it be faster to have an additional field image.products where I store all the products that are associated with one image?

The query to get products for a specific image would currently look like this (I'm using PostgreSQL as the database) :

SELECT * from product where image.id = ANY(product.images)

Having another field that needs to be maintained would be a burden on the API-side. Before I do the refactoring for this I would like to hear experienced opinions, if the query mentioned above can have a performance impact later on.

Thank you in advance for your replies.

ah, the old "delimited column" question. SQL was expressly designed to not handle this, because it runs counter to the underlying philosophy/math it runs on (databases - implementations of SQL - sometimes find it handy to provide types that look like they'd be useful for this). The database is going to be fastest if you pull out the image keys into their own table. [See also this question](https://stackoverflow.com/q/3653462/812837). And look at [the tip near the bottom of this page](https://www.postgresql.org/docs/current/static/arrays.html). — Clockwork-Muse, Dec 30 '17 at 16:50

score 0 · Answer 1 · answered Dec 30 '17 at 15:52

0

For a one-to-many relationship like this (where each product only has one image but each image can have multiple products) you're best off just using a foreign key on image_id.

You only really need an association table for many-to-many, which would be if products could have multiple images and images could be associated with more than one product. If this is the case, use a table called something like product_images_assoc with two columns, product_id and image_id.

With this association table it is easy to get all image_ids for any one product_id and vice versa.

Hope this helps!

answered Dec 30 '17 at 15:52

ztaylor54

396
3
15

Sorry, I guess there was something that I did not explain properly. A product can hold several images. For example product.A has images [1, 2, 3] and product.B holds images [2,3,4]. – Ogofo Dec 30 '17 at 16:04
In that case you should create an association table that links `product_id` to `image_id`. Then you query the ids associated with the image/product in a subquery and use that in the main `SELECT`. I'd provide a better example but I'm on mobile. – ztaylor54 Dec 30 '17 at 16:08
I should also mention that in the association table, both `id` columns should be foreign keys to their respective tables. – ztaylor54 Dec 30 '17 at 16:12

What is the fastest way to query an array of id's?

1 Answers1