I have been practising bulk load using Laurenz Albe' s blog https://www.cybertec-postgresql.com/en/postgresql-bulk-loading-huge-amounts-of-data/. The problem start when I changed it a little bit. The table is the same:
CREATE TABLE t_sample
(
a varchar(50),
b int,
c varchar(50),
d int
);
Instead of
BEGIN;
INSERT INTO t_sample VALUES ('abcd', 1, 'abcd', 1);
INSERT INTO t_sample VALUES ('abcd', 1, 'abcd', 1);
INSERT INTO t_sample VALUES ('abcd', 1, 'abcd', 1);
…
COMMIT;`
I use
BEGIN;
INSERT INTO t_sample VALUES
('abcd', 1, 'abcd', 1),
('abcd', 1, 'abcd', 1),
('abcd', 1, 'abcd', 1),
…
COMMIT;
For those who do not want to read the blog: I try to compare memory consumption between COPY
and insert method.
Also, instead of 1 m records I used 2097152 records. I keep this commands as a file, multipleinsert.sql and run it like psql -d load_test -f multipleinsert.sql
.
Also, as stated in the blog I used COPY
method as following (of course with the same number of records, 2097152)
COPY t_sample FROM stdin;
abcd 1 abcd 1
abcd 1 abcd 1
abcd 1 abcd 1
...
The execution time for insert 14.543s and for COPY
is 1.237s. Yet, this is not the point I want to compare. When I use COPY
the available memory on the server nearly does not decrease, but during execution of the insert statement the available memory decreases nearly 6,5 GB. The file that contains insert statement itself is nearly 49 MB, so PostgreSQL tries to cache it, but why does it scale to ~6,5GB? Is there a calculation method for (roughly) calculating memory that PostgreSQL will consume for such a loading?
Note: How does COPY work and why is it so much faster than INSERT?, stated well why COPY
is faster than insert but not this one nor any other post over the internet did not shed a light over memory consumption.