0

I have a database with repeated IDs, and I want to change the names of this repeated IDs.

So, I have a database of Vehicles and in columns I have the License number (id), the type of vehicle, the colour and also the make.

All the ids are repeated in the database, and I want to SELECT all of them except the first row of each different ID (something like a "DISTINCT id" but inverse...).

EDIT 2:

I have created this table

DROP TABLE IF EXISTS Proces1 CASCADE;
CREATE TABLE Proces1 AS
(
    SELECT id_importat AS id_aux, driver_city AS city_aux, driver_state AS state_aux, gender AS g_aux, race AS r_aux
    FROM ImportaViolations
    WHERE id_importat IN (
        SELECT id_importat
        FROM ImportaViolations
        GROUP BY id_importat    
        HAVING (COUNT(*) > 1))
    GROUP BY id_importat, driver_city, driver_state, gender, race
);

And in this table I have repeated id's but with different information in the columns.

Something like:

id_aux   city_aux      state_aux     g_aux     r_aux
1        London        England        M        WHITE
1        London        England        F        BLACK
2        Madrid        Spain          M        BLACK
2        London        England        F        WHITE
2        London        England        M        WHITE
...

So now, I want to SELECT all the rows with repeated id_aux except for the first one of each different id_aux. So I want to have this final result (in this example):

id_aux   city_aux      state_aux      g_aux    r_aux
1        London        England        F        BLACK
2        London        England        F        WHITE
2        London        England        M        WHITE
...
  • 1
    I removed the incompatible database tags. Please tag only with the database you are really using. – Gordon Linoff Dec 08 '17 at 20:21
  • 1
    You might need to add more specific sample data. For example, you could just say `WHERE Column1 <> 'a'` and get your desired results. I imagine your actual data is more complex. Also assuming it is more complex, you will need a way to know which row is the 'first' row. – Aaron Dietz Dec 08 '17 at 20:24
  • I don't get the question really. select * from table where column1 in ('c','e')? – user8834780 Dec 08 '17 at 20:24
  • 2
    SQL data is stored in unordered sets, which means just because a row shows up first in your result set, you need an `ORDER BY` to guarantee that it is the 'first' row. – Aaron Dietz Dec 08 '17 at 20:25
  • You could use a limit and offset.. example.. $sql = "SELECT * FROM tables LIMIT 10 OFFSET 15"; The SQL query above says "return only 10 records, start on record 16 (OFFSET 15)": Also you can use a DISTINCT to get rid of duplicate IDs – wbrandt Dec 08 '17 at 20:25
  • What are you trying to change the duplicated IDs to? Are you trying to have each record in your table have a unique ID? Because there are better ways of doing that. – digital.aaron Dec 08 '17 at 21:17
  • I believe you may be doing this the wrong way. When you create that new table the rows of that table are affected by the `GROUP BY id_importat, driver_city, driver_state, gender, race` so the row order of the new table is not the row order of the source table. Are there any date/time columns in the `ImportaViolations` table? e.g. date_created or date_modified – Paul Maxwell Dec 08 '17 at 22:50
  • @digital.aaron yes i'm trying to do that – theMaximumForce Dec 12 '17 at 16:53

3 Answers3

0

I've simply modified the answer to this question to better fit your needs: Select first row in each GROUP BY group?

Essentially, rather than selecting the first of each group using rk = 1, I've changed it to be rk > 1 and switched the column names around to match yours.

WITH MyTable AS (
    SELECT p.Id, 
           p.Column1, 
           p.Column2, 
           ROW_NUMBER() OVER(PARTITION BY p.Column1 
                             ORDER BY p.Column2 DESC) AS rk
      FROM MyTable p)
SELECT s.*
  FROM MyTable t
 WHERE s.rk > 1

EDIT: Changed rk = 2 to rk > 1 to select all but the first rather than just the second.

Omar Himada
  • 2,540
  • 1
  • 14
  • 31
0

Something like this?

SELECT
    ID
    , Column1
    , Column2
FROM
    (
        SELECT
            ID
            , Column1
            , Column2
            , ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Column1, Column2) R
        FROM YourTable
    ) Q
WHERE R > 1

Update for Edit 2:

SELECT
    id_aux
    , city_aux
    , state_aux
    , g_aux
    , r_aux
FROM
    (
        SELECT
            id_aux
            , city_aux
            , state_aux
            , g_aux
            , r_aux
            , ROW_NUMBER() OVER (PARTITION BY id_aux ORDER BY id_aux) R
        FROM YourTable
    ) Q
WHERE R > 1
Chris Mack
  • 5,148
  • 2
  • 12
  • 29
  • Thanks but I've tried and it doesn't work how I expected... :( – theMaximumForce Dec 08 '17 at 20:33
  • In what sense? I might be able to make an adjustment. – Chris Mack Dec 08 '17 at 20:33
  • Hi Chris, I have added new information to the question to help you understanding me (EDIT 2). Thanks. – theMaximumForce Dec 08 '17 at 20:51
  • Hi Andres - I understand, but as per Aaron Dietz's comment, you'll need some kind of `ORDER BY` to determine which is the first record per ID - unless it doesn't matter? – Chris Mack Dec 08 '17 at 20:56
  • it doesn't matter, i just want to get all of them except from ONE of the repeated id's (in each repeated id), it doesn't need to be the first one :) but just one – theMaximumForce Dec 08 '17 at 20:56
  • Ok, in that case I've added another example that should work, and if you need to alter the `ORDER BY` clause, which isn't really doing anything in my query, just list the columns you want to order by. – Chris Mack Dec 08 '17 at 21:00
-1

Please pay close attention to the order of rows in the following demonstration. & please note I added a few extra rows as well. Initially we start with "random" rows as and when created in ImportaViolations for this example "first row" for each id_importat

INSERT INTO ImportaViolations
    (id_importat, driver_city, driver_state, gender, race)
VALUES
    (1, 'London', 'England', 'M', 'WHITE'),
    (2, 'Madrid', 'Spain', 'M', 'BLACK'),

BUT if we run this query (with no "order by"):

SELECT id_importat AS id_aux, driver_city AS city_aux
     , driver_state AS state_aux, gender AS g_aux, race AS r_aux
     , rn
FROM (
      select id_importat, driver_city, driver_state, gender, race
           , row_number() over(partition by id_importat) as rn
      from ImportaViolations
     ) d
WHERE rn = 1

This is the result:

| id_aux | city_aux | state_aux | g_aux | r_aux | rn |
|--------|----------|-----------|-------|-------|----|
|      1 |   London |   England |     M | WHITE |  1 |
|      2 |   London |   England |     F | WHITE |  1 |

That result is affected by the necessary partition by (without this there would be a total of one row in the table with a row number of 1).

So: The moral of this tale is that you MUST carefully consider the ORDER to determine what should be "The FIRST ROW" of each id_importat.

SQL Fiddle Demo

CREATE TABLE ImportaViolations
    (id_importat int, driver_city varchar(6), driver_state varchar(7), gender varchar(1), race varchar(5))
;

INSERT INTO ImportaViolations
    (id_importat, driver_city, driver_state, gender, race)
VALUES
    (1, 'London', 'England', 'M', 'WHITE'),
    (2, 'Madrid', 'Spain', 'M', 'BLACK'),
    (1, 'London', 'England', 'F', 'BLACK'),
    (2, 'London', 'England', 'M', 'WHITE'),
    (1, 'London', 'England', 'F', 'BLACK'),
    (2, 'Madrid', 'Spain', 'M', 'BLACK'),
    (2, 'London', 'England', 'F', 'WHITE'),
    (1, 'London', 'England', 'M', 'WHITE'),
    (2, 'London', 'England', 'F', 'WHITE')
;

Main Query:

DROP TABLE IF EXISTS Proces1 CASCADE;
CREATE TABLE Proces1 AS 
(
    SELECT id_importat AS id_aux, driver_city AS city_aux
         , driver_state AS state_aux, gender AS g_aux, race AS r_aux
         , rn
    FROM (
          select id_importat, driver_city, driver_state, gender, race
               , row_number() over(partition by id_importat order by 1) as rn
          from ImportaViolations
         ) d
    WHERE rn > 1
);

Query 1:

select * from Proces1

Results:

| id_aux | city_aux | state_aux | g_aux | r_aux | rn |
|--------|----------|-----------|-------|-------|----|
|      1 |   London |   England |     F | BLACK |  2 |
|      1 |   London |   England |     F | BLACK |  3 |
|      1 |   London |   England |     M | WHITE |  4 |
|      2 |   Madrid |     Spain |     M | BLACK |  2 |
|      2 |   Madrid |     Spain |     M | BLACK |  3 |
|      2 |   London |   England |     F | WHITE |  4 |
|      2 |   London |   England |     M | WHITE |  5 |

Query 2:

select * from ImportaViolations

Results:

| id_importat | driver_city | driver_state | gender |  race |
|-------------|-------------|--------------|--------|-------|
|           1 |      London |      England |      M | WHITE |
|           2 |      Madrid |        Spain |      M | BLACK |
|           1 |      London |      England |      F | BLACK |
|           2 |      London |      England |      M | WHITE |
|           1 |      London |      England |      F | BLACK |
|           2 |      Madrid |        Spain |      M | BLACK |
|           2 |      London |      England |      F | WHITE |
|           1 |      London |      England |      M | WHITE |
|           2 |      London |      England |      F | WHITE |
Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51
  • OK. Thanks. Please just be aware that "order" is vitally important when deciding what is the "first row" and that using `row_number() over()` even without specifying "order by" will affect the sequence simply because of the partitioning. – Paul Maxwell Dec 09 '17 at 01:41