I want to remove duplicate rows from a table measurement
in a PostgreSQL 9.1 data base.
Some table information:
select column_name, data_type from information_schema.columns where table_name = 'measurement';
column_name | data_type
-------------+-----------
s_sum | real
s_l3 | real
s_l2 | real
s_l1 | real
q_sum | real
q_l3 | real
q_l2 | real
q_l1 | real
p_sum | real
p_l3 | real
p_l2 | real
p_l1 | real
irms_n | real
irms_l3 | real
irms_l2 | real
irms_l1 | real
urms_l3 | real
urms_l2 | real
urms_l1 | real
timestamp | integer
site | integer
id | integer
(22 rows)
and
select count(*) from measurement;
count
----------
56265678
(1 row)
So what I want to do is to remove duplicate rows where all columns except id
are equal. I went ahead and tried this with the approach in this answer.
SET temp_buffers = '1GB';
BEGIN;
CREATE TEMPORARY TABLE t_tmp AS
SELECT DISTINCT site,
timestamp,
urms_l1,
urms_l2,
urms_l3,
irms_l1,
irms_l2,
irms_l3,
irms_n,
p_l1,
p_l2,
p_l3,
p_sum,
q_l1,
q_l2,
q_l3,
q_sum,
s_l1,
s_l2,
s_l3,
s_sum
FROM measurement;
TRUNCATE measurement;
INSERT INTO measurement
SELECT * FROM t_tmp;
COMMIT;
where the echo / error is:
SET
BEGIN
SELECT 56103537
TRUNCATE TABLE
ERROR: duplicate key value violates unique constraint "measurement_pkey"
DETAIL: Key (id)=(1) already exists.
ROLLBACK
so it looks as if it would remove the duplicates alright (compare with number of rows of original table measurement
above) but then a primary key constraint is violated. I do not really know what is going on here, I assume that the INSERT
is not operating on the truncated table...
Update:
The requested sql schema is as follows:
--
-- PostgreSQL database dump
--
SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SET check_function_bodies = false;
SET client_min_messages = warning;
--
-- Name: plpgsql; Type: EXTENSION; Schema: -; Owner: -
--
CREATE EXTENSION IF NOT EXISTS plpgsql WITH SCHEMA pg_catalog;
--
-- Name: EXTENSION plpgsql; Type: COMMENT; Schema: -; Owner: -
--
COMMENT ON EXTENSION plpgsql IS 'PL/pgSQL procedural language';
SET search_path = public, pg_catalog;
SET default_tablespace = '';
SET default_with_oids = false;
--
-- Name: measurement; Type: TABLE; Schema: public; Owner: -; Tablespace:
--
CREATE TABLE measurement (
id integer NOT NULL,
site integer,
"timestamp" integer,
urms_l1 real,
urms_l2 real,
urms_l3 real,
irms_l1 real,
irms_l2 real,
irms_l3 real,
irms_n real,
p_l1 real,
p_l2 real,
p_l3 real,
p_sum real,
q_l1 real,
q_l2 real,
q_l3 real,
q_sum real,
s_l1 real,
s_l2 real,
s_l3 real,
s_sum real
);
--
-- Name: measurement_pkey; Type: CONSTRAINT; Schema: public; Owner: -; Tablespace:
--
ALTER TABLE ONLY measurement
ADD CONSTRAINT measurement_pkey PRIMARY KEY (id);
--
-- Name: public; Type: ACL; Schema: -; Owner: -
--
REVOKE ALL ON SCHEMA public FROM PUBLIC;
REVOKE ALL ON SCHEMA public FROM postgres;
GRANT ALL ON SCHEMA public TO postgres;
GRANT ALL ON SCHEMA public TO PUBLIC;
--
-- PostgreSQL database dump complete
--
And then
SELECT id
FROM measurement
GROUP BY id
HAVING COUNT(*) > 1;
yields
id
----
(0 rows)