0

I have two tables temp and md respectively. There is a field called uri_stem which has certain details that I want to omit from temp but not from md. I need to make a comparison that is able to compare certain patterns and remove them from temp if there are similar patterns in md.

Right now I am using this code to remove data similar to the patterns I want to omit, but I want some method that is able to compare the patterns from the md table rather than me hardcording each one. Hope the explanation is clear enough.

FROM 
  spfmtr01.tbl_1c_apps_log_temp 
where 
 uri_stem not like '%.js' and 
 uri_stem not like '%.css' and 
 uri_stem not like  '%.gif' 
 and uri_stem not like '%.png' 
 and uri_stem not like '%.html' 
 and uri_stem not like '%.jpg' 
 and uri_stem not like '%.jpeg' 
 and uri_stem not like '%.ico' 
 and uri_stem not like '%.htm' 
 and uri_stem not  like '%.pdf' 
 and uri_stem not  like '%.Png' 
 and uri_stem not  like '%.PNG'
Tomas Greif
  • 21,685
  • 23
  • 106
  • 155
  • [This](http://stackoverflow.com/questions/4928054/postgresql-wildcard-like-for-any-of-a-list-of-words) should help you partially – Tomas Greif Jun 11 '14 at 10:51

2 Answers2

0

This example is based on answer I mentioned in my comment.

SQLFiddle

Sample data:

drop table if exists a, b;
create table a (testedstr varchar);
create table b (condstr varchar);

insert into a values 
   ('aa.aa.jpg'),
   ('aa.aa.bjpg'), -- no match
   ('aa.aa.jxpg'), -- no match
   ('aa.aa.jPg'), 
   ('aa.aa.aico'), -- no match
   ('aa.aa.ico'), 
   ('bb.cc.dd.icox'), -- no match
   ('bb.cc.dd.cco');  -- no match

insert into b values ('jpg'), ('ico');

Explanation:

  • in table a we have strings we would like to test (stored in column testedstr)
  • in table b we have strings we would to like to use as testing expresions (stored in column condstr)

SQL:

with cte as (select '\.(' || string_agg(condstr,'|') || ')$' condstr from b)

select * from a, cte where testedstr !~* condstr;

Explanation:

  • in the first line we will aggregate all patterns we would like to test into one string; as a result we will get jpg|ico string (aggregated into single row).
  • in the second line we crossjoin tested table with our testing expression (from the first line) and use regular expression to perform the test.
  • the regular expression at the end looks like \.(jpg|ico)$

For older versions, you should use answer provided by @Bohemian. For my sample data it would look like (adjusted for multiple possible dots) this (SQLFiddle:

select 
  *
from 
  a
where 
  lower(reverse(split_part(reverse(testedstr),'.',1))) 
    not in (select lower(condstr) from b)

Without reverse function (SQLFiddle):

select 
  *,
  lower(split_part(testedstr,'.',length(testedstr)- length(replace(testedstr,'.','')) + 1)) as extension
from 
  a
where 
  lower(split_part(testedstr,'.',length(testedstr)- length(replace(testedstr,'.','')) + 1)) not in (select lower(condstr) from b)
Community
  • 1
  • 1
Tomas Greif
  • 21,685
  • 23
  • 106
  • 155
  • @user3455309 Exactly as it is written, did you check example on [SQLFiddle](http://sqlfiddle.com/#!15/70c2c/4)? – Tomas Greif Jun 11 '14 at 11:57
  • Hi, Thanks for the help but we use a very low version of Postgresql. 8.2 I guess so it's not supporting cte. Any other alternative to this code? – user3455309 Jun 11 '14 at 12:19
  • 1
    @user3455309 Option 1: Upgrade, Option2: See my edited answer. – Tomas Greif Jun 11 '14 at 12:37
  • Thanks. While upgrade sounds a real good option its unfortunately not in my hands. I tried your query but its throwing an error 'function reverse(character varying) does not exist'. – user3455309 Jun 11 '14 at 12:58
  • @user3455309 Yeah, my fault, there is no `reverse` function in ancient versions. See my updated answer, this time I checked that all functions are supported by 8.2 – Tomas Greif Jun 11 '14 at 13:13
0

First let's refactor the many conditions into just one:

where lower(substring(uri_stem from '[^.]+$')) not in ('js', 'css', 'gif', 'png', 'html', 'jpg', 'jpeg', 'ico', 'htm', 'pdf')

In this form, it's easy to see how the list of values can be selected instead of coded:

where lower(substring(uri_stem from '[^.]+$')) not in (
    select lower(somecolumn) from sometable)

Note the use of lower() to avoid problems of dealing with variants of case.

You could also code it as a join:

select t1.*
from mytable t1
left join sometable t2
    on lower(somecolumn) = lower(split_part(uri_stem, '.', 2))
where t2.somecolumn is null -- filter out matches
Bohemian
  • 412,405
  • 93
  • 575
  • 722
  • This assumes that there is only single dot in the string. Wouldn't it be better to use something like `select lower(reverse(split_part(reverse('aaa.bbb.cCc'),'.',1)))` to get part of string after the last dot in string? – Tomas Greif Jun 11 '14 at 11:17
  • Exactly what I wanted to mention. The string has more than one dot. Here is an example of a string that needs to be compared. /Core/Scripts/jquery.flipCounter.1.2.js – user3455309 Jun 11 '14 at 11:23
  • 1
    @TomasGreif yes, you're right, but there's an easier fix than that: you can use regex `substring(uri_stem from '[^.]+$')` to extract all trailing non-dot chars - see edit – Bohemian Jun 11 '14 at 15:00
  • @user3455309 my first cut didn't cater for multiple dots, but see edit for how you can use regex `substring(uri_stem from '[^.]+$')` to extract all trailing non-dot chars, which works for any number of dots (including no dots) – Bohemian Jun 11 '14 at 15:03