0

I am writing a data export where I need to return one row from a selection where there may be multiple rows. In this case, the second table is the telephone_current table. This table includes a row for several telephone types (CA, MA, PR, etc.), and they are not in any particular order. If the individual has a CA, I need to include that record; if not, then I would use either type MA or PR.

The query below works, technically, but it will run excruciatingly slow (10 minutes or more).

I need advice to fix this query to get one row (record) per individual. The slowdown occurs when I include the self join telephone_current tc. Note. I've also moved the AND into the WHERE clause, which runs with the time delay.

SELECT distinct igp.isu_id PersonnelNumber
    , igp.preferred_first_name FirstName
    , igp.current_last_name LastName
    , NULL Title
    , igp.current_mi MiddleInitial
    , pd.email_preferred_address
    , tc.phone_number_combined
    , igp.isu_username networkID
    , '0' GroupID
    , e.home_organization_desc GroupName
    , CASE
        WHEN  substr(e.employee_class,1,1) in ( 'N', 'C') THEN 'staff'
        WHEN substr(e.employee_class,1,1) = 'F' THEN 'faculty'
        ELSE 'other'
    END GroupType
    FROM isu_general_person igp
    JOIN person_detail pd ON igp.person_uid = pd.person_uid
    JOIN telephone_current tc ON igp.person_uid = tc.entity_uid
        AND tc.phone_number = (
            SELECT p.phone_number
            FROM telephone_current p
            WHERE tc.entity_uid = p.entity_uid
            ORDER BY phone_type
           FETCH FIRST 1 ROW ONLY
            )
   LEFT JOIN employee e ON igp.person_uid = e.person_uid
--   LEFT JOIN faculty f ON igp.person_uid = f.person_uid
WHERE 1=1
AND e.employee_status = 'A'
AND substr(e.employee_class,1,1) in ( 'N', 'C', 'F')
AND igp.isu_username IS NOT NULL
;
Aaron
  • 51
  • 6
  • 1
    Are columns you use in joins indexed? Phone number included. – Littlefoot Nov 22 '21 at 19:10
  • Cross apply with top 1 or use a analytic row number.... – xQbert Nov 22 '21 at 19:22
  • telephone_current is indexed on entity_uid the other tables are indexed on person_uid. entity_uid = person_uid – Aaron Nov 22 '21 at 19:30
  • 2
    Post at least the [execution plan](https://stackoverflow.com/a/34975420/4808122) so we can see where your problem is located. Some *additional* notes. 1) using `SELECT distinct` usually indicates that your *join predicates* are **not OK**. 2) while joining *whole or large parts* of tables usually **no index will help**. 3) filtering on `1=1` has (if you are lucky) no effect and should be removed. – Marmite Bomber Nov 22 '21 at 20:28
  • If I had to guess in the above; the bottleneck is in the employee_class substring. It has to do it for every record to determine if it's in your NCF list... add a function based index for it so your where clause and select case expression can benefit. As a function based index; each row would already have the 1st letter of the class separated; and it would be indexed. So the limit and the select should both more efficient. – xQbert Nov 22 '21 at 21:34
  • Returning one row per select or per group are faqs. When you ask re code give a [mre]. Re SQL performance research it & asking about it. [ask] [Help] PS LEFT JOIN returns INNER JOIN rows UNION ALL unmatched left table rows extended by NULLs. Always know what INNER JOIN you want as part of an OUTER JOIN. After a LEFT JOIN a WHERE, INNER JOIN or HAVING that requires a right [sic] table column to be not NULL removes any rows with introduced NULLs, ie leaves only INNER JOIN rows, ie "turns OUTER JOIN into INNER JOIN". You have that. – philipxy Nov 22 '21 at 22:33

2 Answers2

1

We did identify problem with the index on the telephone_current table. Once that was resolved, both the versions provided by xQbert worked to provide the single-row result for each individual. The version using WITH BaseData ran in approximately 12 seconds. However, this version returned all rows in 2.4 seconds.

SELECT distinct igp.isu_id PersonnelNumber
    , igp.preferred_first_name FirstName
    , igp.current_last_name LastName
    , NULL Title
    , igp.current_mi MiddleInitial
    , pd.email_preferred_address
    , tc.phone_number_combined
    , igp.isu_username networkID
    , '0' GroupID
    , e.home_organization_desc GroupName
    , CASE
        WHEN  substr(e.employee_class,1,1) in ( 'N', 'C') THEN 'staff'
        WHEN substr(e.employee_class,1,1) = 'F' THEN 'faculty'
        ELSE 'other'
    END GroupType
    FROM isu_general_person igp
    JOIN person_detail pd 
      ON igp.person_uid = pd.person_uid
    CROSS APPLY (SELECT xtc.phone_number_combined
                 FROM telephone  xtc
                 WHERE igp.person_uid = xtc.entity_uid
                 ORDER BY case when phone_type = 'CA' then 1 
                               when phone_Type in ('MA','PR') then 2 
                               else 3 end, 
                           phone_Type, 
                           phone_number_combined
                 FETCH FIRST 1 ROW ONLY) tc 
   LEFT JOIN employee e ON igp.person_uid = e.person_uid
--   LEFT JOIN faculty f ON igp.person_uid = f.person_uid
WHERE 1=1
AND e.employee_status = 'A'
AND substr(e.employee_class,1,1) in ( 'N', 'C', 'F')
AND igp.isu_username IS NOT NULL
Aaron
  • 51
  • 6
  • I'd recommend going back through old questions and accepting answers (even if it's your own) and upvoting those that help. It shows you care about the feedback those have provided and helps others understand what solutions worked or aided in getting to a solution for your questions. – xQbert Nov 29 '21 at 22:38
0

Example using row_number() analytic and a common table expression. This limits to one phone per person by creating a partition/group of numbers under a given Entity_Uid orders this by a case expression and then assigns row number based on that case expression defined order then phone type, then phone number. The row number is then used to limit the results to just 1 phone number.

WITH BaseData as (

SELECT distinct igp.isu_id PersonnelNumber
    , igp.preferred_first_name FirstName
    , igp.current_last_name LastName
    , NULL Title
    , igp.current_mi MiddleInitial
    , pd.email_preferred_address
    , tc.phone_number_combined
    , igp.isu_username networkID
    , '0' GroupID
    , e.home_organization_desc GroupName
    , CASE
        WHEN  substr(e.employee_class,1,1) in ( 'N', 'C') THEN 'staff'
        WHEN substr(e.employee_class,1,1) = 'F' THEN 'faculty'
        ELSE 'other'
    END GroupType,
    row_number() over (PARTITION BY Entity_Uid ORDER BY case when phone_type ='CA' then 1 
                                                             when phone_Type in ('MA','PR') then 2 
                                                             else 3 end, phone_Type, Phone_number) RN
    FROM isu_general_person igp
    JOIN person_detail pd ON igp.person_uid = pd.person_uid
    JOIN telephone_current tc ON igp.person_uid = tc.entity_uid
   LEFT JOIN employee e ON igp.person_uid = e.person_uid
--   LEFT JOIN faculty f ON igp.person_uid = f.person_uid
WHERE 1=1
AND e.employee_status = 'A'
AND substr(e.employee_class,1,1) in ( 'N', 'C', 'F')
AND igp.isu_username IS NOT NULL)

SELECT * 
FROM BaseData 
WHERE RN = 1
;

Example As cross apply: Cross apply avoids the need of the analytic and basically says; hey; for each matching igp.person_uid = xtc.entity_uid, get the first record based on the order defined in the subquery. quit when you've got the 1st record for each user

SELECT distinct igp.isu_id PersonnelNumber
    , igp.preferred_first_name FirstName
    , igp.current_last_name LastName
    , NULL Title
    , igp.current_mi MiddleInitial
    , pd.email_preferred_address
    , tc.phone_number_combined
    , igp.isu_username networkID
    , '0' GroupID
    , e.home_organization_desc GroupName
    , CASE
        WHEN  substr(e.employee_class,1,1) in ( 'N', 'C') THEN 'staff'
        WHEN substr(e.employee_class,1,1) = 'F' THEN 'faculty'
        ELSE 'other'
    END GroupType,
    FROM isu_general_person igp
    JOIN person_detail pd 
      ON igp.person_uid = pd.person_uid
    CROSS APPLY (SELECT xtc.phone_number
                 FROM telephone_current  xtc
                 WHERE igp.person_uid = xtc.entity_uid
                 ORDER BY case when phone_type = 'CA' then 1 
                               when phone_Type in ('MA','PR') then 2 
                               else 3 end, 
                           phone_Type, 
                           Telephone_current
                 FETCH FIRST 1 ROW ONLY) tc 
   LEFT JOIN employee e ON igp.person_uid = e.person_uid
--   LEFT JOIN faculty f ON igp.person_uid = f.person_uid
WHERE 1=1
AND e.employee_status = 'A'
AND substr(e.employee_class,1,1) in ( 'N', 'C', 'F')
AND igp.isu_username IS NOT NULL
xQbert
  • 34,733
  • 2
  • 41
  • 62
  • I love the two methods you prescribed to get the correct (and only a single) phone record. I have never used a CASE in an ORDER BY clause; it is elegant. These are elegant solutions. Unfortunately, the query still runs forever; it must be doing a table scan or have some other index problem. If I limit the rows by adding AND igp.person_uid in ( 25245,3178, 27637, 12962) The script runs quickly and the results are as needed. But when I let it runs slow. – Aaron Nov 22 '21 at 21:22
  • I'd ensure you have an index on isu_general_person.person_uid , telephone_Current.Entity_Uid, employee .person_uid , also consider function based indexes for you substring on employee class as well as my case expression for setting the phone_type priority. the generation of the records to compare can be pre-empted by the function based indexes. but yeah a execution plan to find the bottle necks is what's next. As a function based index; each row would already have the 1st letter of the class separated; and it would be indexed. So the limit and the select should both more efficient. – xQbert Nov 22 '21 at 21:35