2

What is wrong with my SQL code?

I tried to eliminate duplicate rows following this answer

But I keep getting the following error:

near "(": syntax error: SELECT rn = ROW_NUMBER() OVER (

Here is the SQL code:

SELECT 
    rn = ROW_NUMBER() OVER (PARTITION BY s.stop_id, s.stop_name 
                            ORDER BY s.stop_id, s.stop_name)
FROM stops s

I read somewhere that it has to do with SQL versions or the usage of sqlite3 ??

Here some additional information to the problem:

I have the beginning table:

table_beginning =
[some_text0 , some_text1 , some_text2 ,  some_text3   ]
[ bla_1     ,   monday   ,    red     , bla_something ]
[ bla_77    ,   tuesday  ,   green    , bla_other     ]
[ bla_99    ,   monday   ,    blue    , bla_anything  ]
[ bla_00    ,  wednesday ,    red     , bla_maybe     ]
[ bla_55    ,   monday   ,   violet   , bla_more      ]
[ bal_66    ,   monday   ,    red     , bla_kind      ]
[ bal_22    ,   monday   ,    red     , bla_stuff     ]

I would like to end up with a table that has no doublicates concerning col_2 and col_3 (not caring whatever col_1 and col_4 are !!)

Such as:

table_final1 =
[some_text0 , some_text1 , some_text2 ,  some_text3   ]
[ bla_1     ,   monday   ,    red     , bla_something ]
[ bla_77    ,   tuesday  ,   green    , bla_other     ]
[ bla_99    ,   monday   ,    blue    , bla_anything  ]
[ bla_00    ,  wednesday ,    red     , bla_maybe     ]
[ bla_55    ,   monday   ,   violet   , bla_more      ]

(--> which ones of the rows that are kicked out does not matter. The accepted final table could therefore also look like:

table_final2 =
[some_text0 , some_text1 , some_text2 ,  some_text3   ]
[ bla_77    ,   tuesday  ,   green    , bla_other     ]
[ bla_99    ,   monday   ,    blue    , bla_anything  ]
[ bla_00    ,  wednesday ,    red     , bla_maybe     ]
[ bla_55    ,   monday   ,   violet   , bla_more      ]
[ bal_66    ,   monday   ,    red     , bla_kind      ]

or

table_final3 =
[some_text0 , some_text1 , some_text2 ,  some_text3   ]
[ bla_77    ,   tuesday  ,   green    , bla_other     ]
[ bla_99    ,   monday   ,    blue    , bla_anything  ]
[ bla_00    ,  wednesday ,    red     , bla_maybe     ]
[ bla_55    ,   monday   ,   violet   , bla_more      ]
[ bal_22    ,   monday   ,    red     , bla_stuff     ]

All that matters is that col_2 and col_3 have never the same two entries !

As you can see some_text1 = monday AND some_text2 = red exists now only once !! (eliminating doublicates from the point of view only looking at col_2 and col_3)

As of content in col_1 and col_4 - I don't care at all what is in there ! I'm only concerned about what is inside col_2 and col_3 (not having any doublicates there !)

One solution:

I figured out a way (but maybe there is a more elegant one??)

CREATE TABLE table_intermediate AS
  SELECT DISTINCT col_2, col_3
FROM table_beginning;

--> This creates an intermediate table --> with the DISTINCT keyword this does eliminate doublicates --> (disadvantage, I loose col_1 and col_4 information completely)

Maybe there is a solution where I can keep information on col_1 and col_4 ??? (again, which of the col_1 or col_4 I do not care !)

iKK
  • 6,394
  • 10
  • 58
  • 131
  • 3
    SQLite does not support window functions. See this related post: https://stackoverflow.com/q/49069627/460557 – Jorge Campos May 05 '18 at 22:43
  • Thank you ! How can I eliminate doublicate rows then ? Is there another solution ? – iKK May 05 '18 at 22:44
  • 1
    just use distinct: `select distinct yourcolumns from....` know that distinct is for all columns... – Jorge Campos May 05 '18 at 22:45
  • Yes but I have columns with doublicates and the other columns vary - therefore DISTINCT does not what I want (I would like to eliminate doublicates of 1 column - not caring what the others show) – iKK May 05 '18 at 22:48
  • 2
    Add some sample data and from it the desired result in your question, right now it is a clear example of an [XY problem](http://xyproblem.info/) you said what you want instead of explain what you need. – Jorge Campos May 05 '18 at 22:50
  • 1
    @Jorge Campos: (sorry for the delay in response - timeshift ;)) - I added more information. I also think that your suggestion to use `DISTINCT` brought me forward - remains the question if there is a solution where I don't "loose" col_1 and col_4 information with the `DISTINCT` keyword ? – iKK May 06 '18 at 10:23
  • 1
    Ah - maybe Gordon Linoff's solution is it ?? (I just saw it coming in...) – iKK May 06 '18 at 10:28
  • That is valid syntax. But same columns for partition by and order by makes no sense. – paparazzo May 06 '18 at 10:39
  • It makes no sense in 99% of all cases - I agree. But some exceptions are making life more interesting ;) Thanks for the input ! – iKK May 06 '18 at 10:53
  • Yes, Gordon's answer is the way to go! – Jorge Campos May 07 '18 at 03:06

1 Answers1

2

In SQLite, you can typically use rowid:

select s.*
from stops s
where s.rowid = (select min(s2.rowid)
                 from stops s2
                 where s2.stop_id = s.stop_id and s2.stop_name = s.stop_name
                );

I'm not sure if this is what you really need. But this seems to be what you want to do with row_number(). If this doesn't hit the spot, then ask another question with sample data and desired results.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786