1

So I am a student currently learning about PostgreSQL. I am trying to figure out the way, how to randomly seed data. I have 10M users and 100 stocks.

Currently my tables will look like:

CREATE TABLE user (
  user_id INTEGER NOT NULL,   
  amount_of_stocks [][] array, -- this is just assumption
  PRIMARY KEY (user_id)
);

CREATE TABLE stock (
  stock_id INTEGER NOT NULL,   
  amount_per_stock INT,
  quantity INT
  PRIMARY KEY (stock_id)
);

How would I store 100 different stocks for each user?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228

2 Answers2

3

Sounds like a classical many-to-many relationship. Should not involve arrays at all. Assuming Postgres 10 or later, use something along these lines:

CREATE TABLE users (  -- "user" is a reserved word!
  user_id  int GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, username text UNIQUE NOT NULL  -- or similar
);

CREATE TABLE stock (
  stock_id int GENERATED ALWAYS AS IDENTITY PRIMARY KEY
, stock    text UNIQUE NOT NULL  -- or similar
);

CREATE TABLE user_stock (
  user_id  int REFERENCES users
, stock_id int REFERENCES stock
, amount   int NOT NULL
, PRIMARY KEY (user_id, stock_id)
);

Detailed explanation:

Seed

Postgres provides generate_series() to conveniently generate rows. random() is instrumental to generate random numbers:

INSERT INTO users(username)
SELECT 'user_' || g
FROM   generate_series(1, 10000000) g; -- 10M (!) - try with just 10 first

INSERT INTO stock(stock)
SELECT 'stock_' || g
FROM   generate_series(1, 100) g;

Experiment with a small number of users first. 10M users * 100 stocks generates a billion rows. Takes some time and occupies some space.

How would I store 100 different stocks for each user?

INSERT INTO user_stock
        (user_id,   stock_id, amount)
SELECT u.user_id, s.stock_id, ceil(random() * 1000)::int
FROM   users u, stock s;  -- cross join

Every user gets 100 different stocks - though everyone gets the same set in this basic example, you did not define more closely. I added a random amount per stock between 1 and 1000.

About the cross join to produce the Cartesian product:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thank you for the explanation, really helpful – Andrii Malanchuk Dec 18 '19 at 06:35
  • Hi Erwin what do you use to format your SQL statements? You do this by hand or using a tool? – user2609980 Dec 19 '19 at 13:09
  • 1
    @ErwinRooijakkers: I do it by hand, mostly. Seems tedious, but it helps me process, structure and understand complex queries a lot. I prefer this compact form over the often noisy format of various clients and frameworks. Actually, I like this special format for `INSERT` (see update) to see which expression goes into which column at a glance. – Erwin Brandstetter Dec 19 '19 at 23:31
0
CREATE TABLE user (
  user_id INTEGER NOT NULL,   
  stocks text[], 
  PRIMARY KEY (user_id)
);

Store a list of the primary keys in your Stock table, so you can easily look up their values with a select statement.

If you wanted to, you could make the array two-dimensional, and store the value.. but that violates some principle, I'm sure, as you already have a table for that purpose.

Eric Lang
  • 274
  • 1
  • 2
  • 16