1

I have the Employer table which has two columns, Ecode and Ename.

ecode   empname    

E1  NIKHIL
E2  Nikhil.
E3  Nikhil??
E4  sunita..
E5  sunita..
E6  sunita
E7  Mangesh
E8  HIREN
E9  HIREN..
E10 HIREN??
E11 Viren

I need the output to include duplicate values of Empname, ignoring special characters. Nikhil, Nikhil., Nikhil? all should be part of my output.

Expected Output:-

ecode   ename
E1  NIKHIL
E2  Nikhil.
E3  Nikhil??
E4  sunita..
E5  sunita..
E6  sunita
E8  HIREN
E9  HIREN..
E10 HIREN??
APC
  • 144,005
  • 19
  • 170
  • 281
Nikiyer
  • 21
  • 3

2 Answers2

2
with test (ecode, empname) as
(
select 'E1'  ,'NIKHIL'    from dual union all
select 'E2'  ,'Nikhil.'   from dual union all
select 'E3'  ,'Nikhil??'  from dual union all
select 'E4'  ,'sunita..'  from dual union all
select 'E5'  ,'sunita..'  from dual union all
select 'E6'  ,'sunita'    from dual union all
select 'E7'  ,'Mangesh'   from dual union all
select 'E8'  ,'HIREN'     from dual union all
select 'E9'  ,'HIREN.'    from dual union all
select 'E10' ,'HIREN??'   from dual union all
select 'E11' ,'Viren'     from dual
)
select ecode, empname,
dense_rank() over (order by upper(regexp_replace(empname, '\W'))) grp
from
   (select t.*,
    count(*) over (partition by upper(regexp_replace(empname, '\W'))) cnt
    from test t
   )
where cnt > 1
order by upper(regexp_replace(empname, '\W')), empname;

ECO EMPNAME         GRP
--- -------- ----------
E8  HIREN             1
E9  HIREN.            1
E10 HIREN??           1
E1  NIKHIL            2
E2  Nikhil.           2
E3  Nikhil??          2
E6  sunita            3
E5  sunita..          3
E4  sunita..          3

9 rows selected.
Amir Kadyrov
  • 1,253
  • 4
  • 9
  • Just needs an ORDER BY clause to be a perfect answer. – APC Mar 21 '19 at 08:20
  • need criteria for order by clause. – Amir Kadyrov Mar 21 '19 at 08:25
  • @APC, I do not understand what you mean. You are did not use "partition by" clause with "large data set" or you want to suggest another solution? Or may be we should not work with "large data set" and window funtions was designed for "small data set"? – Amir Kadyrov Mar 21 '19 at 09:43
  • "count(*) over (partition by ...)" is used to counting group's counts to find duplicates. If you want to show groups (as you said group them together) you may use dense_rank as I set for your by editing my answer. But if you want to data be sorted you must specify ORDER BY criteria as I said before. – Amir Kadyrov Mar 21 '19 at 11:22
  • By grouping, I just meant that the duplicates should follow sequentially in the result set. The ORDER BY is sufficient – APC Mar 21 '19 at 11:35
0

Here's one option: find duplicates on EMPNAME column by removing anything but letters (that's what '[^[:alpha:]]' does):

SQL> with test (ecode, empname) as
  2  (
  3  select 'E1'  ,'NIKHIL'    from dual union all
  4  select 'E2'  ,'Nikhil.'   from dual union all
  5  select 'E3'  ,'Nikhil??'  from dual union all
  6  select 'E4'  ,'sunita..'  from dual union all
  7  select 'E5'  ,'sunita..'  from dual union all
  8  select 'E6'  ,'sunita'    from dual union all
  9  select 'E7'  ,'Mangesh'   from dual union all
 10  select 'E8'  ,'HIREN'     from dual union all
 11  select 'E9'  ,'HIREN.'    from dual union all
 12  select 'E10' ,'HIREN??'   from dual union all
 13  select 'E11' ,'Viren'     from dual
 14  ),
 15  inter as
 16    (select ecode, empname, upper(regexp_replace(empname, '[^[:alpha:]]', '')) rc
 17     from test
 18    )
 19  select ecode, empname
 20  from inter
 21  where rc in (select rc from inter
 22               group by rc
 23               having count(*) > 1
 24              )
 25  order by to_number(regexp_substr(ecode, '\d+$'));

ECO EMPNAME
--- --------
E1  NIKHIL
E2  Nikhil.
E3  Nikhil??
E4  sunita..
E5  sunita..
E6  sunita
E8  HIREN
E9  HIREN.
E10 HIREN??

9 rows selected.

SQL>
Littlefoot
  • 131,892
  • 15
  • 35
  • 57
  • Clever use of regex to order the result set but on a large data set the inefficiency of calling `regexp_replace()` three times would start to bite. – APC Mar 21 '19 at 08:19
  • Yes, it would, @APC. So I deleted some, to make it prettier. – Littlefoot Mar 21 '19 at 08:23