2

I have a query that fetches system data about a database

$data = Query "SELECT [server_name]
      ,[sessionID]
      ,[user]
      ,[ElapsedTime]
from table"
$data | Export-Csv -Path $Path\data.csv -NoTypeInformation

Recently I've noticed there are duplicate entries for sessionID in the data exported. Is there an option to select distinct by session ID only?

Something like

Query "SELECT [server_name]
      ,SELECT DISTINCT[sessionID]
      ,SELECT [user]
      ,[ElapsedTime]
from table"

For example, if the table has:

table

The exported csv should only contain:

desired table

Hadi
  • 36,233
  • 13
  • 65
  • 124
Cataster
  • 3,081
  • 5
  • 32
  • 79
  • Are you saying all the fields are duplicated, in which case this is just a de-duplication exercise, or that you want to pick one record per `sessionID`, in which case we need to know what the selection criteria should be? – Eric Brandt Mar 19 '19 at 21:21
  • for sql server you can simple use DISTINCT after SELECT – Hasan Mahmood Mar 19 '19 at 21:24
  • @HasanMahmood but if i do that then everything would have to be ditinct. meaning if elapsed time is the same accross multiple session ID's, wouldnt it be filtered out even if the session ID is different? – Cataster Mar 19 '19 at 21:26
  • @EricBrandt check new pictures i added. i know i can use SELECT distinct, but then what if there is same elpased time or same user in another entry but has a different session ID. wouldnt it be ignored by the select destinct? – Cataster Mar 19 '19 at 21:27
  • it will be DISTINCT per row, meaning all rows values combine a DISTINCT. try that and let us know. Whole DISTINCT will give you the result you want in your example. – Hasan Mahmood Mar 19 '19 at 21:28
  • @HasanMahmood oh so every single column value has to match in order for it to count as duplicate? – Cataster Mar 19 '19 at 21:29
  • @Cataster yes, you are right. – Hasan Mahmood Mar 19 '19 at 21:29
  • @HasanMahmood so just to make sure, if one of the columns, say elapsed time, is the same for multiple entries, but the session ID's are different, then the row will be exported because session ID is different even if elapsed time is same? – Cataster Mar 19 '19 at 21:30
  • @HasanMahmood, I think OP is wanting to "select distinct by session ID only". https://stackoverflow.com/questions/6127338/sql-mysql-select-distinct-unique-but-return-all-columns seems like what the OP is looking for – jmesolomon Mar 19 '19 at 21:35
  • @jmesolomon yep! – Cataster Mar 19 '19 at 21:36

2 Answers2

3

Little too much for a comment.

This query will filter off any rows where all of the values are the same:

$data = Query "SELECT DISTINCT [server_name]
      ,[sessionID]
      ,[user]
      ,[ElapsedTime]
from table"
$data | Export-Csv -Path $Path\data.csv -NoTypeInformation

If any column has a different value, you'll get two (or more) records.

EDIT:

On the other hand, if you just want one record per sessionID, this query will return just that. The ROW_NUMBER() in the ORDER BY requires an internal ORDER BY clause, but it sounds like you don't care what it sorts by, so this is essentially a random ordering to get down to one row.

$data = Query "SELECT TOP (1) WITH TIES [server_name]
      ,[sessionID]
      ,[user]
      ,[ElapsedTime]
from table
ORDER BY ROW_NUMBER() OVER (PARTITION BY sessionID ORDER BY update_time DESC) "
$data | Export-Csv -Path $Path\data.csv -NoTypeInformation

2nd EDIT: Added the update_time to the sort criteria.

Eric Brandt
  • 7,886
  • 3
  • 18
  • 35
  • ok so i just tried it out. the problem is the session ID is duplicated regardless. meaning i dont care for the other values whether they are distinct or not. i only care that session ID is distinct. if session ID is repeated but elapsed time is different, i still want to delete this entry and keep only one because the session ID is the same – Cataster Mar 19 '19 at 21:34
  • yep. basically if you look at the second table, it did not elimnate the 5432 session ID entry even though the elapsed time as well as the user are the same. that is because i only want to have different session ID's. that is the primary key – Cataster Mar 19 '19 at 21:38
  • suppose i have an update_time column. is it possible to get the latest session ID entry base don that? – Cataster Mar 19 '19 at 21:45
  • wait now that i think about it, doesnt Top (1) already get the latest entry? – Cataster Mar 19 '19 at 21:48
  • Not without an `ORDER BY`. – Eric Brandt Mar 19 '19 at 21:50
  • got it. im testing it out :) – Cataster Mar 19 '19 at 21:50
  • question, since i am also exporting the update time, wouldnt i have to include the column update_time as part of the select? – Cataster Mar 19 '19 at 21:51
  • I liked using `SELECT TOP (1) WITH TIES` – Hadi Mar 19 '19 at 22:02
  • @Cataster, yeah, you'll want to tweak the code to suit your actual needs. I was just using the sample you'd posted to play around with. Glad it helped. – Eric Brandt Mar 19 '19 at 22:13
3

You can use a similar query:

SELECT * FROM (SELECT [server_name]
      ,[sessionID]
      ,[user]
      ,[ElapsedTime]
      ,ROW_NUMBER() OVER(PARITION BY sessionID ORDER BY sessionID) rn
from table) T WHERE rn = 1

OR

WITH CTE_1 AS(SELECT [server_name]
          ,[sessionID]
          ,[user]
          ,[ElapsedTime]
          ,ROW_NUMBER() OVER(PARITION BY sessionID ORDER BY sessionID) rn
    from table) SELECT * FROM CTE_1 WHERE rn = 1
Hadi
  • 36,233
  • 13
  • 65
  • 124
  • @Cataster, this does exactly the same thing as the edited version in my answer. Mostly just a matter of which format you prefer. (+1, by the way, for getting yours up first, Hadi.) – Eric Brandt Mar 19 '19 at 21:45
  • very interesting. i would have never known about this – Cataster Mar 19 '19 at 21:53