0

I'm having trouble with a join query returning duplicate records and I'm hoping someone can help me figure out what is going on.

I've reviewed similar posts here, and have refined my query somewhat based on that information, but I'm not really a SQL guy and I just can't seem to get the right combination.

Basically I have 2 tables, Attrib and AgentLV (Live).

Table Attrib (really table Agent_Attribute) contains information about what groups an agent belongs to - The agent is identified by column Attrib.SkillTargetID, and Attrib.AttributeID identifies the specific group we're interested in (an agent can belong to multiple groups)

Table 2 contains events associated with a specific agent. The agent Identifier is AgentLV.SkillTargetID. Agent events for a specific agents can occur from any group the agent is part of, and the information includes a timestamp.

The information I am trying to extract is: for agents in a specific group, what is the most recent event for that agent.

It doesn't matter if the event is associated with the group being queried, or another group the agent is a member of. I just care what the agents in that group were doing most recently.

So for a specific group, there will be 1 row for each agent in the group, for this query:

SELECT TOP 100 
    Attrib.SkillTargetID AS AttribSKID,
    Attrib.AttributeID,
    [AttributeValue]
FROM 
    Agent_Attribute Attrib --returns 1 STID by itself
WHERE
    Attrib.AttributeID = 5068
ORDER BY
    AttributeValue DESC

Like This:

6221    5068    5
6210    5068    5
6197    5068    5
6192    5068    5
6184    5068    5

But for the AgentLV table, there will be multiple rows.

So, ie, for agent 6221, multiple events are returned by query

SELECT TOP 5 
    AgentLV.SkillTargetID AS AgntLvSKID,
    AgentLV.DateTime,
    AgentLV.Event
FROM
    [prod_awdb].[dbo].[Agent_Event_Detail] AgentLV
WHERE 
    AgentLV.SkillTargetID = 6221
    AND AgentLV.[DateTime] >  GETDATE() - 1

Output:

6221    2019-06-11 07:55:49.000 1
6221    2019-06-11 07:55:53.000 3
6221    2019-06-11 11:30:00.000 3
6221    2019-06-11 11:45:00.000 3
6221    2019-06-11 11:46:20.000 3

My goal is to construct a query that returns 1 row - with the most recent timestamp - for each user in the group. So for group 5068, agent 6221 should return ONLY

Attrib.SkillTargetID    AgentLV.SillTargetID    Attrip.AttributeID  AttribValue AgentLV.DateTime    AgetnLV.Event
6221            6221            5068            5       2019-06-11:46       3

To that end (In its simplest form) I tried the query below:

SELECT TOP 100 
    Attrib.SkillTargetID AS AttribSKID,
    AgentLV.SkillTargetID AS AgntLvSKID,
    Attrib.AttributeID,
    [AttributeValue],
    AgentLV.DateTime
FROM 
    Agent_Attribute Attrib --returns 1 STID by itself
INNER JOIN
    [prod_awdb].[dbo].[Agent_Event_Detail] AgentLV 
         ON Attrib.SkillTargetID = (SELECT TOP 1 AgentLV.SkillTargetID 
                                    FROM [prod_awdb].[dbo].[Agent_Event_Detail] AgentLV
                                    WHERE AgentLV.SkillTargetID = Attrib.SkillTargetID
                                    ORDER BY DateTime DESC)
WHERE
    Attrib.AttributeID = 5068
    AND AgentLV.DateTime > GETDATE() - 1
ORDER BY 
    AttributeValue DESC

but even though I'm trying to return 1 row from AgentLV by returning 'TOP 1', I wind up with many rows for each agent, like this:

AttribSKID  AgntLvSKID  AttributeID AttributeValue  DateTime    
6192        5461        5068            5   2019-06-11 21:01:12.007 11:59:08.050
6184        5461        5068            5   2019-06-11 21:01:12.007 11:59:08.050
6221        5461        5068            5   2019-06-11 21:01:12.007 11:59:08.050
6184        5461        5068            5   2019-06-11 21:01:12.000 11:59:08.000
6221        5461        5068            5   2019-06-11 21:01:12.000 11:59:08.000
6192        5461        5068            5   2019-06-11 21:01:12.000 11:59:08.000
6192        6758        5068            5   2019-06-11 21:01:05.007 18:52:13.077
6184        6758        5068            5   2019-06-11 21:01:05.007 18:52:13.077
6221        6758        5068            5   2019-06-11 21:01:05.007 18:52:13.077
6192        5798        5068            5   2019-06-11 21:01:02.007 11:58:21.550
6184        5798        5068            5   2019-06-11 21:01:02.007 11:58:21.550
6221        5798        5068            5   2019-06-11 21:01:02.007 11:58:21.550
6192        6419        5068            5   2019-06-11 21:01:01.007 10:02:28.563
6184        6419        5068            5   2019-06-11 21:01:01.007 10:02:28.563
6221        6419        5068            5   2019-06-11 21:01:01.007 10:02:28.563

I also tried changing the first row to

SELECT Distinct TOP 100 

Can anyone tell me what I'm doing wrong?

Squirrel, Here is the modified version I tried, of your query:

SELECT *
FROM   (

       SELECT rn = row_number() over (partition by Attrib.SkillTargetID

                                          order by AgentLV.DateTime desc),
   Attrib.SkillTargetID --AS AttribSKID
   ,AgentLV.SkillTargetID AS AgntLvSKID
  ,Attrib.AttributeID
  --,Attrib.Description
  ,[AttributeValue]
  ,AgentLV.Duration
  ,AgentLV.DateTime
  --,AgentLV.LoginDateTime
       FROM   Agent_Attribute Attrib

       INNER JOIN [prod_awdb].[dbo].[Agent_Event_Detail] AgentLV
       ON     Attrib.SkillTargetID = AgentLV.SkillTargetID 
   ) AS D
WHERE  D.rn = 1
  AND D.AttributeID>5067 AND D.AttributeID<5071
  AND D.[DateTime] >  GetDate()-1 --specify fraction of day
  order by D.AttributeID

I expect to get returned data that looks like this:

AttribSKID  AgntLvSKID  AttributeID AttributeValue  DateTime
6197        6197        5068        5       2019-6-12 8:40
6183        6183        5068        5       2019-6-12 8:40
6221        6221        5068        5       2019-6-12 8:39
6192        6192        5068        5       2019-6-12 8:39
6184        6184        5068        5       2019-6-12 8:40
6210        6210        5068        5       2019-6-12 8:40

Actually eventually it will look like this - but that will come later:

AttribSKID  AgntLvSKID  AgentDTStgsID   DTsettings  Attrib      AttributeID AttributeValue  AgentNm     DN      Duration    DateTime    LoginDateTime
6197        6417        5012        US.Dtsettings   US.Attrib.Name  5068        5       US.NameHere 15551112222 185     2019-6-12 8:40  2019-6-12 8:15

Even small Test Data extracts for two tables would be quite large - Is there a way to attach files here? Im unable to find it. For the second table Id need to show about 500 rows for it to be useful.

wolf8963
  • 61
  • 2
  • 7
  • Possible duplicate of [Select first row in each GROUP BY group?](https://stackoverflow.com/questions/3800551/select-first-row-in-each-group-by-group) – philipxy Jun 12 '19 at 08:51
  • Can you please provide a small sample data from each of the tables with the corresponding expected result ? – Squirrel Jun 13 '19 at 01:16

1 Answers1

0

you can make use or row_number() window function to generate a sequence no per SkillTargetID

SELECT *
FROM   (
           SELECT rn = row_number() over (partition by Attrib.SkillTargetID
                                              order by AgentLV.DateTime desc),
                  {other columns that you required}
           FROM   Agent_Attribute Attrib
           INNER JOIN [prod_awdb].[dbo].[Agent_Event_Detail] AgentLV
           ON     Attrib.SkillTargetID = AgentLV.SkillTargetID 
       ) AS D
WHERE  D.rn = 1
Squirrel
  • 23,507
  • 4
  • 34
  • 32
  • Wow - that was a quick response! I attempted a variation of your suggestion and have a couple of concerns. The query ran over 30 seconds and since this will be run on a live interactive system I want to be careful not to do anything that will affect performance. My next concern is that,when I placed WHEREs at the bottom of the query, to specify a specific AttributeID and Date, the AttributeID I searched for in the original example, returns 0 rows - I had to expand the Attribute range to insure the query was working. I know that ID should exist (5068) BC my original query returns them. – wolf8963 Jun 12 '19 at 05:15
  • 1
    please update your original question with the updated query and sample data with the expected result – Squirrel Jun 12 '19 at 05:20
  • PS on the updated post the 'AND D.AttributeID>5067 and D.AttributeID<5071 would normally be just 'AND D.AttributeID=5068' ie - but that is currently returning 0 rows for some strange reason. Also - could you comment on the long run time? Is there another way to do this that is more efficient? thanks – wolf8963 Jun 12 '19 at 14:49