1

http://sqlfiddle.com/#!9/b98ea/1 (Sample Table)

I have a table with the following fields:

  1. transfer_id
  2. src_path
  3. DH_USER_ID
  4. email
  5. status_state
  6. ip_address

src_path field contains a couple of duplicates filename values but a different folder name at the beginning of the string.

Example:

  1. 191915/NequeVestibulumEget.mp3
  2. /191918/NequeVestibulumEget.mp3
  3. 191920/NequeVestibulumEget.mp3

I am trying to do the following:

  1. Set status_state field to 'canceled' for all the duplicate filenames within (src_path) field except for one.

I want the results to look like this: http://sqlfiddle.com/#!9/5e65f/2

*I apologize in advance for being a complete noob, but I am taking SQL at college and I need help.

Rudy Herdez
  • 195
  • 1
  • 1
  • 8
  • Hi again Rudy. The second row start with a `/` is that ok or is a typo? – Juan Carlos Oropeza Sep 04 '15 at 17:27
  • possible duplicate of [Find duplicate rows with PostgreSQL](http://stackoverflow.com/questions/14471179/find-duplicate-rows-with-postgresql) – Jakub Kania Sep 04 '15 at 17:34
  • 1
    Eliminating duplicates is a popular task. You didn't show what you tried so far. – Jakub Kania Sep 04 '15 at 17:35
  • I have tried several usuals examples. But since src_path contains a different path at the beginning of the string, I am lost on how to either pass that to a variable or sorted based on the duplicate of the filenames. So far, I have select substring(src_path, '[^\\//]*$') from priority_transfer; to only give me filenames (without the slash) – Rudy Herdez Sep 04 '15 at 17:38
  • @JakubKania Even when is similar. The duplicated part have to be calculate first before use the normal process. – Juan Carlos Oropeza Sep 04 '15 at 17:38
  • In your fiddle looks like all duplicate filename are group by `transfer_id` is that always the case? even when src_pth is different – Juan Carlos Oropeza Sep 04 '15 at 18:12
  • sorry, that WAS a typo on the random data generator website. Transfer_id should be incremental. I updated the fiddle to reflect this change. http://sqlfiddle.com/#!9/b98ea/1 – Rudy Herdez Sep 04 '15 at 18:16
  • what about filename `\Sollicitudin.mp3` is ` \ ` allow as part of the filename ? – Juan Carlos Oropeza Sep 04 '15 at 18:18
  • Yes it could be either slash \ or /. – Rudy Herdez Sep 04 '15 at 18:21
  • Well that is very weird. That should be some validation on the input to update that. – Juan Carlos Oropeza Sep 04 '15 at 18:22
  • Basically, the example says that src_path gets updated based on the uploader's OS. If windows then "C:\somepath\filename.mov, if Linux then "/home/user/filename.mov" – Rudy Herdez Sep 04 '15 at 18:24
  • Your want all dup set to `canceled` what about the original? doesnt matter or one should have `pending` ?? In your case that file name have `pending` and `queued` .... Also if no dup, we dont change status? – Juan Carlos Oropeza Sep 04 '15 at 18:29
  • I want to leave one untouched and the rest set to canceled. It doesn't matter the status, I just need to set all dups to canceled except one. If no dups, then leave status the same. – Rudy Herdez Sep 04 '15 at 18:37

2 Answers2

1

SQL Fiddle Demo

  • fix_os_name: Fix the windows path string to unix format.
  • file_name: Split the path using /, and use char_length to bring last split.
  • drank: Create a seq for each filename. So unique filename only have 1, but dup also have 2,3 ...
  • UPDATE: check if that row have rn > 1 mean is a dup.

.

Take note the color highlight is wrong, but code runs ok.

with fix_os_name as (
    SELECT transfer_id, replace(src_path,'\','/') src_path, 
    DH_USER_ID, email, status_state, ip_address
    FROM priority_transfer p
),  
file_name as (
    SELECT 
       fon.*,
       split_part(src_path,
                  '/',
                  char_length(src_path) - char_length(replace(src_path,'/','')) + 1
                 ) sfile
    FROM fix_os_name fon
), 
drank as (
    SELECT 
        f.*,
        row_number() over (partition by sfile order by sfile) rn
    from file_name f
)
UPDATE priority_transfer p
SET status_state = 'canceled'
WHERE EXISTS ( SELECT *
               FROM drank d
               WHERE d.transfer_id = p.transfer_id
               AND  d.rn > 1);

ADD: One row is untouch

enter image description here

Juan Carlos Oropeza
  • 47,252
  • 12
  • 78
  • 118
0

Use the regexp_matches function to separate the file name from the directory. From there you can use distinct() to build a table with unique values for the filename.

select
regexp_matches(src_path, '[a-zA-Z.0-9]*$') , *
from priority_transfer
;
PabTorre
  • 2,878
  • 21
  • 30
  • You should try your query in the fiddle. As you can see there is window and unix folders. – Juan Carlos Oropeza Sep 04 '15 at 19:28
  • You are correct @JuanCarlosOropeza, I didn't see this before. Modified the query, so that it gets all the file names now, regardless of win or unix folders... thnx! – PabTorre Sep 04 '15 at 19:37
  • Sorry pat still doesnt work. You need remove the folder portion and leave the remaing filename after '\' or '/' sometimes doesnt have folder. Let me know if you can work a regexp, i have to convert all '\' to '/' and then split by '/' – Juan Carlos Oropeza Sep 04 '15 at 19:40
  • How about select regexp_matches(src_path, '[^\\//]*$') from priority_transfer; ? – Rudy Herdez Sep 04 '15 at 20:09
  • select substring(src_path, '[^\\//]*$') from priority_transfer; ....Only gives me filenames after either / or \ – Rudy Herdez Sep 04 '15 at 20:15
  • @JuanCarlosOropeza that's strange. When I run the query in Fiddle it catches all of the file names... http://sqlfiddle.com/#!15/5e65f/17/0 – PabTorre Sep 04 '15 at 20:40
  • Dont ask me. Im only know basic regexp, and i already tell pabtorre his solution didnt work,. – Juan Carlos Oropeza Sep 04 '15 at 20:45