4

I am collecting performance counters from NetApp Performance Manager software (OPM). OPM saves 30 days worth of data in MySQL database. So i have to put two queries to retrieve the data:

  1. First Query on 30th of every month and save in csv file.
  2. Second Query on 1st of every month and save in csv file.

Then merge the two csv files to get data if there are 31 days in a month.

Both files look like below:

"Time","objid","cpuBusy","cifsOps","avgLatency"
"2016:06:04 00:04","72","50.6196","2069.11","7622.1"
"2016:06:04 00:09","72","30.2233","2075.94","7633.27"
"2016:06:04 00:14","72","35.2559","1980.64","8352.17"

When i merge the two csv files with below code. I get duplicate rows with data from same data/time.

@(Import-Csv au2004npsa003-mm-business.csv) + @(Import-Csv au2004npsa003-nn-business.csv) | export-csv joined.csv -NoTypeInformation

How can i merge the two csv files without getting duplicate data? I have tried select -unique however, it gives just one row.

Nitish
  • 43
  • 1
  • 1
  • 3

3 Answers3

7

As for why Select-Object -Unique didn't work:

  • Select-Object -Unique, when given instances of reference types (other than strings), compares their .ToString() values in order to determine uniqueness.

  • [pscustomobject] instances, such as the ones Import-Csv creates, regrettably return the empty string from their .ToString() method.

    • This long-standing bug, still present as of PowerShell (Core) 7.2, was originally reported in GitHub issue #6163.

Thus, all input objects compare the same, and only the first input object is ever returned.

S9uare's helpful Select-Object -Property * -Unique approach overcomes this problem by forcing all properties to be compared invidually, but comes with a performance caveat:
The input objects are effectively recreated, and comparing all property values is overkill in this case, because comparing Time values would suffice; with large input files, processing can take a long time.


Since the data at hand comes from CSV files, the performance problem can be helped with string processing, using Get-Content rather than Import-Csv:

Get-Content au2004npsa003-mm-business.csv, au2004npsa003-nn-business.csv |
  Select-Object -Unique |
    Set-Content -Encoding ASCII joined.csv

Note that I'm using -Encoding ASCII to mimic Export-Csv's default behavior; change as needed.

With input objects that are strings, Select-Object -Unique works as expected - and is faster.
Note, however, that with large input files that you may run out of memory, given that Select-Object needs to build up an in-memory data structure containing all rows in order to determine uniqueness.

mklement0
  • 382,024
  • 64
  • 607
  • 775
3

Select -Unique is actually the correct way, you just need a little bit more. Which is -Property * or simply *

As Select -Unique does not automatically compares all the properties, you need to specify * so that it compares the whole object.

$csv1 = Import-Csv -Path ".\csv1.csv"
$csv2 = Import-Csv -Path ".\csv2.csv"

$merged = $csv1 + $csv2

$merged | Select -Property * -Unique
S9uare
  • 56
  • 4
  • 2
    This is ingeniously simple but comes with a performance caveat: the input objects are effectively recreated, and you invariably have to compare _all_ property values, even though comparing `"Time"` values would suffice; with large input files, processing can take a long time. Quibble: not an issue in this case, but it's generally safer to use `@($csv1) + $csv2` to handle the case where `$csv1` happens to contain a single object only. (To lower memory use, also consider sending the objects directly through the pipeline instead: `Import-Csv .\csv1.csv, .\csv2.csv | Select -Property * -Unique`) – mklement0 Jul 14 '16 at 00:09
0

made it complete script just run it

#** ### Merged CSV to Unique


#csv1
   # name,id, birth
  #John,007,1999
  #john,008,1999
  #john,709,1999


#csv2

  # name,id, birth
  #John,006,1999
  #john,005,1999
  #john,709,1999

#After Merged

  #John,007,1999
  #john,008,1999
  #**john,709,1999
  #John,006,1999
  #john,005,1999


  #  >>> Only one Unique Files exits john,709,1999


  ##### Just merge CSV example

#csv1
   # name,id, birth
  #John,007,1999
  #john,008,1999
  #john,709,1999

  #csv2
  # name,id, birth
   #John,006,1999
  #john,005,1999
  #john,709,1999


  ## after Merged

   #name,id, birth
  #John,007,1999
  #john,008,1999
  #john,709,1999
  #John,006,1999
  #john,005,1999
  #john,709,1999




 ## Usage

$csv1 = Import-Csv -Path "D:\CSV1.csv" #change the path to where csv one exists
$csv2 = Import-Csv -Path "D:\CSV2.csv" #change the path to where other exists
$UniQexpopath= 'D:\uniqueMerged.csv'  #change the path to where you want to export unique csv
$expopath= 'D:\merged.csv' #change the path to where you want to export Merged csv



  Write-Host "Would you like to Unique Merge the CSV?"        -ForegroundColor Green 
$Response = Read-Host "[Y] Yes, [N] No" 
    If($Response -eq "y") 
{write-host "Merging Csv please wait" -foregroundcolor Red

$merged | Select -Property * -Unique | Export-csv -path $UniQexpopath -NoTypeInformation 
write-host "Merging completed please check $UniQexpopath" -foregroundcolor Green}


  else{ write-host "Merging Csv please wait" -foregroundcolor Red
  $merged = $csv1 + $csv2

$merged | Export-csv -path $expopath -NoTypeInformation
write-host "Merging completed please check $expopath" -foregroundcolor Green}
DisplayName
  • 1,008
  • 10
  • 24
  • try the script by s9suare it works add export csv copy the below script `$csv1 = Import-Csv -Path "D:\CSV1.csv" $csv2 = Import-Csv -Path "D:\CSV2.csv" $merged = $csv1 + $csv2 $merged | Select -Property * -Unique | Export-csv -path D:\Merged.csv -NoTypeInformation` – DisplayName Jul 13 '16 at 05:07