197

I have two tables in MySQL. Table Person has the following columns:

id name fruits

The fruits column may hold null or an array of strings like ('apple', 'orange', 'banana'), or ('strawberry'), etc. The second table is Table Fruit and has the following three columns:

fruit_name color price
apple red 2
orange orange 3
----------- -------- ------

So how should I design the fruits column in the first table so that it can hold array of strings that take values from the fruit_name column in the second table? Since there is no array data type in MySQL, how should I do it?

tonga
  • 11,749
  • 25
  • 75
  • 96
  • 4
    see [How to store an array into mysql?](http://stackoverflow.com/q/3413291/309483) and [Storing arrays in MySQL?](http://stackoverflow.com/q/2031254/309483) and [How can I simulate an array variable in MySQL?](http://stackoverflow.com/q/12176709/309483) – Janus Troelsen Jun 28 '13 at 18:52
  • 1
    how about adding it as separate entries, orange, 2, 1, rose, 2, 1, etc. and then you can use queries to treat them as though they were arrays. – Sai Jun 28 '13 at 18:53
  • @JanusTroelsen: I'm not using PHP to read/write DB. So is there a universal way to do it? – tonga Jun 28 '13 at 18:59
  • 1
    @tonga check my fiddle is that what you want ? – echo_Me Jun 28 '13 at 19:01

7 Answers7

245

The proper way to do this is to use multiple tables and JOIN them in your queries.

For example:

CREATE TABLE person (
`id` INT NOT NULL PRIMARY KEY,
`name` VARCHAR(50)
);

CREATE TABLE fruits (
`fruit_name` VARCHAR(20) NOT NULL PRIMARY KEY,
`color` VARCHAR(20),
`price` INT
);

CREATE TABLE person_fruit (
`person_id` INT NOT NULL,
`fruit_name` VARCHAR(20) NOT NULL,
PRIMARY KEY(`person_id`, `fruit_name`)
);

The person_fruit table contains one row for each fruit a person is associated with and effectively links the person and fruits tables together, I.E.

1 | "banana"
1 | "apple"
1 | "orange"
2 | "straberry"
2 | "banana"
2 | "apple"

When you want to retrieve a person and all of their fruit you can do something like this:

SELECT p.*, f.*
FROM person p
INNER JOIN person_fruit pf
ON pf.person_id = p.id
INNER JOIN fruits f
ON f.fruit_name = pf.fruit_name
Bennik2000
  • 1,112
  • 11
  • 25
Bad Wolf
  • 8,206
  • 4
  • 34
  • 44
  • 13
    The third table is the link table between Person and Fruit. So if a person has 100 fruits. I need to create 100 rows in the third table, right? Is this efficient? – tonga Jun 28 '13 at 19:11
  • 2
    @tonga Exactly, each of the 100 rows would have the same `person_id` but a different `fruit_name`. This is is effectively an implementation of the theory from Janus' answer. – Bad Wolf Jun 28 '13 at 19:14
  • 1
    To provide an example for the footnote[1] in Janus' answer, if you needed to also store the quantity of fruit a person had, then you would add a quantity column to the `person_fruit` table. – Bad Wolf Jun 28 '13 at 19:17
  • 2
    Is it always true that any relation between two tables needs to be stored in the third table? Can I just do a query to find the relation by just storing the primary keys from two tables? – tonga Jun 28 '13 at 19:20
  • 5
    Yes, which is how the example is setup now. Any information about the person should be in the `person` table, any information about the fruit in the `fruits` table, and any information specifically about the relationship between a particular person and a particular fruit in the `person_fruit` table. Because in this example there isn't any additional information the `person_fruit` table is only two columns, the primary keys of the `person` and `fruits` tables. Quantity of a specific fruit is an example of something else that could go in the `person_fruit` table however. – Bad Wolf Jun 28 '13 at 19:26
  • Thank you for solution. Was looking for it for few days. – Stopfan Jan 17 '15 at 09:17
  • What is a `NULL` primary key? – Iharob Al Asimi Mar 20 '15 at 20:17
  • 4
    Would it not be better to use a `INT` for a key in `fruits` and only have this `INT` in `person_fruit`? So the name can be changed later and would also need less space if you have not many more rows in `fruits` than in `person_fruit`. – 12431234123412341234123 Aug 21 '17 at 08:43
  • You should really have an order by clause for it to be considered an array as the database doesn't guarantee order, while arrays usually do. – wobbily_col Jan 23 '18 at 13:59
  • I'm new to MySQL, could you talk more on the last statements you made, with the INNER JOIN, I don't quite get it – Mofe Ejegi Jun 28 '18 at 18:35
  • 1
    What if person 1 wanted to edit their "apple". Wouldn't that reference to the same fruit and affect person 2's apple? – alexr101 Jun 28 '18 at 19:05
  • Isn't this very slow because you have to scan through the entire `person_fruit` table to find all the fruits associated with a person? – Foobar Nov 02 '19 at 17:22
  • I'm really late to this but it's really helped me. I wondered if you would have a person_fruit table for each person, e.g. username123_fruit, or have one big table to store all person fruit relationships? – TMG Sep 16 '20 at 15:58
  • I am confused as of why this would be better. As a result of that query, assuming you have one "person" and 2 "fruits", and that person has both fruits, the SQL query will return you 2 rows, with each the entire information of the person, _and_ the fruit in each of the row (only the fruit information will change in the resulting rows' columns). Why is this more effective than selecting the resulting rows from the fruit table where X column value is = person id? Then you get only the amount of rows there are actually fruits for this person _without_ the person's data duplicated in each row... –  Mar 05 '22 at 07:04
  • So this is how how a typical M-M relationship is implemented right? What if the array we are storing were just an int[] (integer array) or some other basic type that isn't a foreign key? would we just create the third table but without reference to the nonexistent table (the data in that column would just be themselves instead of foreign keys)? I suppose the answer is a trivial "yes" but I still want to make sure I'm following correctly. – Twisted on STRIKE at1687989253 May 20 '22 at 14:30
  • This is just wrong: > "When you want to retrieve a person and all of their fruit you can do something like this:" < It gets _all_ entries in the third table. That very SQL never defines "a user". It literally gets ALL existing users and then all existing entries in the third table for all those users. This is not how you get _a_ person's fruits. It gets _all persons with all their associated fruits_ To get _a_ specific persons fruit you can do a plain simple select from where query Not sure why this is the accepted answer to the initial question at all. –  Jun 26 '22 at 06:53
84

The reason that there are no arrays in SQL, is because most people don't really need it. Relational databases (SQL is exactly that) work using relations, and most of the time, it is best if you assign one row of a table to each "bit of information". For example, where you may think "I'd like a list of stuff here", instead make a new table, linking the row in one table with the row in another table.[1] That way, you can represent M:N relationships. Another advantage is that those links will not clutter the row containing the linked item. And the database can index those rows. Arrays typically aren't indexed.

If you don't need relational databases, you can use e.g. a key-value store.

Read about database normalization, please. The golden rule is "[Every] non-key [attribute] must provide a fact about the key, the whole key, and nothing but the key.". An array does too much. It has multiple facts and it stores the order (which is not related to the relation itself). And the performance is poor (see above).

Imagine that you have a person table and you have a table with phone calls by people. Now you could make each person row have a list of his phone calls. But every person has many other relationships to many other things. Does that mean my person table should contain an array for every single thing he is connected to? No, that is not an attribute of the person itself.

[1]: It is okay if the linking table only has two columns (the primary keys from each table)! If the relationship itself has additional attributes though, they should be represented in this table as columns.

Janus Troelsen
  • 20,267
  • 14
  • 135
  • 196
  • 5
    Thanks Janus. That makes sense. Now I understand why MySQL doesn't support array type in a column. – tonga Jun 28 '13 at 19:07
  • some key value stores: MongoDB, Elasticsearch, Couch DB, BigTable etc. – Sai Jun 28 '13 at 19:13
  • 3
    @Sai - For the stuff I'm doing, do I really need the NoSQL solution? – tonga Jun 28 '13 at 19:48
  • I guess it depends on the nature of the data, as well as the type of queries you would like to run. I just suggested these in case you were relying on storing arrays primarily. Cheers – Sai Jun 28 '13 at 19:51
  • 1
    OK, so if I have a table in which a field contains a numerical array of thousands of elements, e.g., some 2D data collected from a sensor, is it much better to use NoSQL DB? – tonga Jun 28 '13 at 19:53
  • 7
    @tonga: The amount of data doesn't determine the db type to use, the nature of the data does. If there are no relations, you don't need at relational database. But since this is the industry standard, you may keep it and just not use the relational features. Most data is relational in some way! A common reason for denormalizing relational databases or using key-value stores is because of performance reasons. But those problems only arise once you have MILLIONS of rows! Don't optimize prematurely! I'd recommend going just going with an SQL db (I recommend PostgreSQL). If you have problems, ask. – Janus Troelsen Jun 28 '13 at 19:58
  • 3
    PostgreSQL also has key-value stores built-in, which means it'd be even easier to move away from the relational model if it doesn't fit you. – Janus Troelsen Jun 28 '13 at 20:00
  • The first criteria of normslitation: atomicy also must not be violated. – Jakob Alexander Eichler Mar 16 '15 at 03:32
  • Is a string (TEXT or VARCHAR) not an array? You can argue that it is an array of characters or bytes. – 12431234123412341234123 Aug 21 '17 at 08:49
  • 1
    Acutally sql standard has array type since sql99, it's just that mysql didn't implement them. see http://farrago.sourceforge.net/design/CollectionTypes.html – sify Feb 25 '22 at 02:41
  • "there are no arrays in SQL" has been plain wrong for decades: MySQL != SQL – gouessej Jul 03 '23 at 14:01
71

MySQL 5.7 now provides a JSON data type. This new datatype provides a convenient new way to store complex data: lists, dictionaries, etc.

That said, arrays don't map well databases which is why object-relational maps can be quite complex. Historically people have stored lists/arrays in MySQL by creating a table that describes them and adding each value as its own record. The table may have only 2 or 3 columns, or it may contain many more. How you store this type of data really depends on characteristics of the data.

For example, does the list contain a static or dynamic number of entries? Will the list stay small, or is it expected to grow to millions of records? Will there be lots of reads on this table? Lots of writes? Lots of updates? These are all factors that need to be considered when deciding how to store collections of data.

Also, Key/Value data stores, Document stores such as Cassandra, MongoDB, Redis etc provide a good solution as well. Just be aware of where the data is actually being stored (if its being stored on disk or in memory). Not all of your data needs to be in the same database. Some data does not map well to a relational database and you may have reasons for storing it elsewhere, or you may want to use an in-memory key:value database as a hot-cache for data stored on disk somewhere or as an ephemeral storage for things like sessions.

Kerem
  • 11,377
  • 5
  • 59
  • 58
Charles D Pantoga
  • 4,307
  • 1
  • 15
  • 14
69

A sidenote to consider, you can store arrays in Postgres.

Eric Grotke
  • 4,651
  • 3
  • 21
  • 19
  • 11
    Additional note: they can be indexed, so queries checking for existence of specific values in an array can be very fast. Same goes for complex JSON types. – timetofly Aug 06 '18 at 14:08
  • 11
    This doesn't answer the question in any way. OP asked about MySQL. – jhpratt Jul 30 '19 at 01:44
  • 2
    If you use ArrayField in Postgres and have a exhaustive list of values in that column(like a fixed list of tags), you can create a GIN index - it will dramatically speed up the queries on that column. – lumos42 Aug 20 '19 at 08:25
53

In MySQL, use the JSON type.

Contra the answers above, the SQL standard has included array types for almost twenty years; they are useful, even if MySQL has not implemented them.

In your example, however, you'll likely want to create three tables: person and fruit, then person_fruit to join them.

DROP TABLE IF EXISTS person_fruit;
DROP TABLE IF EXISTS person;
DROP TABLE IF EXISTS fruit;

CREATE TABLE person (
  person_id   INT           NOT NULL AUTO_INCREMENT,
  person_name VARCHAR(1000) NOT NULL,
  PRIMARY KEY (person_id)
);

CREATE TABLE fruit (
  fruit_id    INT           NOT NULL AUTO_INCREMENT,
  fruit_name  VARCHAR(1000) NOT NULL,
  fruit_color VARCHAR(1000) NOT NULL,
  fruit_price INT           NOT NULL,
  PRIMARY KEY (fruit_id)
);

CREATE TABLE person_fruit (
  pf_id     INT NOT NULL AUTO_INCREMENT,
  pf_person INT NOT NULL,
  pf_fruit  INT NOT NULL,
  PRIMARY KEY (pf_id),
  FOREIGN KEY (pf_person) REFERENCES person (person_id),
  FOREIGN KEY (pf_fruit) REFERENCES fruit (fruit_id)
);

INSERT INTO person (person_name)
VALUES
  ('John'),
  ('Mary'),
  ('John'); -- again

INSERT INTO fruit (fruit_name, fruit_color, fruit_price)
VALUES
  ('apple', 'red', 1),
  ('orange', 'orange', 2),
  ('pineapple', 'yellow', 3);

INSERT INTO person_fruit (pf_person, pf_fruit)
VALUES
  (1, 1),
  (1, 2),
  (2, 2),
  (2, 3),
  (3, 1),
  (3, 2),
  (3, 3);

If you wish to associate the person with an array of fruits, you can do so with a view:

DROP VIEW IF EXISTS person_fruit_summary;
CREATE VIEW person_fruit_summary AS
  SELECT
    person_id                                                                                              AS pfs_person_id,
    max(person_name)                                                                                       AS pfs_person_name,
    cast(concat('[', group_concat(json_quote(fruit_name) ORDER BY fruit_name SEPARATOR ','), ']') as json) AS pfs_fruit_name_array
  FROM
    person
    INNER JOIN person_fruit
      ON person.person_id = person_fruit.pf_person
    INNER JOIN fruit
      ON person_fruit.pf_fruit = fruit.fruit_id
  GROUP BY
    person_id;

The view shows the following data:

+---------------+-----------------+----------------------------------+
| pfs_person_id | pfs_person_name | pfs_fruit_name_array             |
+---------------+-----------------+----------------------------------+
|             1 | John            | ["apple", "orange"]              |
|             2 | Mary            | ["orange", "pineapple"]          |
|             3 | John            | ["apple", "orange", "pineapple"] |
+---------------+-----------------+----------------------------------+

In 5.7.22, you'll want to use JSON_ARRAYAGG, rather than hack the array together from a string.

Talk Nerdy To Me
  • 626
  • 5
  • 21
drew
  • 2,949
  • 3
  • 25
  • 27
  • Keep in mind that using JSON_ARRAYAGG isn't without cost in terms of performance, native SQL array support would eliminate the encoding and the decoding. – gouessej Jul 03 '23 at 13:56
0

Use database field type BLOB to store arrays.

Ref: http://us.php.net/manual/en/function.serialize.php

Return Values

Returns a string containing a byte-stream representation of value that can be stored anywhere.

Note that this is a binary string which may include null bytes, and needs to be stored and handled as such. For example, serialize() output should generally be stored in a BLOB field in a database, rather than a CHAR or TEXT field.

Community
  • 1
  • 1
webdevfreak
  • 147
  • 2
-10

you can store your array using group_Concat like that

 INSERT into Table1 (fruits)  (SELECT GROUP_CONCAT(fruit_name) from table2)
 WHERE ..... //your clause here

HERE an example in fiddle

echo_Me
  • 37,078
  • 5
  • 58
  • 78