0

I would like to remove duplicates in a CSV file using PowerShell. I know that there are posts about this already but I can't seem to find one that helps.

I'm trying to merge 2 CSV Files that have the same header and then remove the duplicates of the resulting file based on the IDs listed in the first column and then put it to the same CSV file.

The properties of the file are as follows: enter image description here

And when I try to use the sort and unique method, I get the following (not a table: enter image description here

Here is my code so far: enter image description here

####
#MERGE
$getFirstLine = $true
    get-childItem "C:\IGHandover\Raw\IG_INC*.csv"| foreach {
    $filePath = $_
    $lines =  $lines = Get-Content $filePath  
    $linesToWrite = switch($getFirstLine) {
           $true  {$lines}
           $false {$lines | Select -Skip 1}
    }
    $getFirstLine = $false
    Add-Content "C:\IGHandover\new.csv" $linesToWrite
    }

####
#REMOVE DUPLICATES
Import-Csv "C:\IGHandover\new.csv" | Sort inc_number -Unique |
    Set-Content "C:\IGHandover\new.csv"
TylerH
  • 20,799
  • 66
  • 75
  • 101
  • Please add code and not images as it's more difficult to reproduce it. – Manu Nov 06 '17 at 09:06
  • Hi Manu. Here is what I have so far: #### #MERGE $getFirstLine = $true get-childItem "C:\IGHandover\Raw\IG_INC*.csv"| foreach { $filePath = $_ $lines = $lines = Get-Content $filePath $linesToWrite = switch($getFirstLine) { $true {$lines} $false {$lines | Select -Skip 1} } $getFirstLine = $false Add-Content "C:\IGHandover\new.csv" $linesToWrite } #### #REMOVE DUPLICATES Import-Csv "C:\IGHandover\new.csv" | Sort inc_number -Unique | Set-Content "C:\IGHandover\new.csv" – Trizia Dimalanta Nov 06 '17 at 09:11
  • Trizia, add the code (code + results) in your question, not in the comment. Replace the confidential informations in the question. – Manu Nov 06 '17 at 09:13
  • If you want to see your `Import-Csv "C:\IGHandover\new.csv" | Sort inc_number -Unique` to display data in tabular format then `Format-Table -AutoSize` is what you are looking for. But that is just for your representation purpose on the Shell screen. What is it exactly you are looking for? Do the `Sort` and `Unique` properties don't work correctly for you? – Vivek Kumar Singh Nov 06 '17 at 09:30
  • Your immediate problem is that you are using `Set-Content` io `Export-Csv -NoClobber`. Solving that takes you to another hurdle: you are writing to a file you are still reading from. That can be solved by adding brackets around the import but much easier is to write to a new file. – Lieven Keersmaekers Nov 06 '17 at 09:42
  • Hi Vivek. It works fine but I need to write it in a new CSV File. The merging part works great but I'm having trouble removing the duplicates – Trizia Dimalanta Nov 06 '17 at 09:45
  • As @LievenKeersmaekers pointed out, instead of `Set-Content`, you can use `Export-Csv -NoClobber` if you don't want to over-write the file. Can you try this and let us know - `Import-Csv "C:\IGHandover\new.csv" | Sort inc_number -Unique | Export-Csv "C:\IGHandover\new.csv" -NoTypeInformation -NoClobber` – Vivek Kumar Singh Nov 06 '17 at 09:51
  • Hi @LievenKeersmaekers and Vivek, it worked well! I just added "-force" since I'm getting errors without it. I'll post the solution in a while. Thanks a lot! – Trizia Dimalanta Nov 06 '17 at 10:12

3 Answers3

1

Don't use Get-Content or Set-Content to import or export csv file

Import-Csv (Get-ChildItem 'C:\IGHandover\Raw\IG_INC*.csv') |         
        Sort-Object -Unique inc_number |
            Export-Csv 'C:\IGHandover\new.csv' -NoClobber -NoTypeInformation
Vincent K
  • 1,326
  • 12
  • 19
0

I guess you want to update a table (HandoverINC.csv) with records from a new table (New.csv), replacing any records in the HandoverINC.csv with the same primary key (inc_number) from the New.csv in the HandoverINC.csv. And add any new records in the New.csv to the HandoverINC.csv (Basically what is called a Full Join in SQL).

Using the Join-Object described at: https://stackoverflow.com/a/45483110/1701026

Import-CSV .\HandoverINC.csv | FullJoin (Import-CSV .\New.csv) inc_number {$Right.$_} | Export-CSV .\HandoverINC.csv
iRon
  • 20,463
  • 10
  • 53
  • 79
0

As suggested by Lieven Keersmaekers and Vivek Kumar, I've made a few changes in my code:

  • Put the merged contents to a temporary file
  • Import the csv file with the merge contents
  • Sort the column of reference and use the unique parameter
  • Export the results to a new csv file

I found that my code was similar to Vincent K's:

#MERGE
$getFirstLine = $true
get-childItem "C:\IGHandover\Raw\IG_INC*.csv"|
foreach {
    $filePath = $_
    $lines =  $lines = Get-Content $filePath  
    $linesToWrite = switch($getFirstLine) {
    $true  {$lines}
    $false {$lines | Select -Skip 1}}
    $getFirstLine = $false
    Add-Content "C:\IGHandover\HandoverINCtemp.csv" $linesToWrite }

#REMOVE DUPLICATES
Import-Csv "C:\IGHandover\HandoverINCtemp.csv" | Sort inc_number -Unique |
    Export-Csv "C:\IGHandover\HandoverINC.csv" -NoClobber -NoTypeInformation -Force
    Remove-Item "C:\IGHandover\HandoverINCtemp.csv"

To simplify (merging and removing duplicates with the same header), as suggested by Vincent:

Import-Csv (Get-ChildItem "C:\IGHandover\Raw\IG_INC*.csv") | Sort inc_number -Unique |
    Export-Csv "C:\IGHandover\HandoverINC.csv" -NoClobber -NoTypeInformation -Force

I hope this helps anyone who'd like to do the same with their files

  • this code will merge your csv file: "Import-Csv (Get-ChildItem 'C:\IGHandover\Raw\IG_INC*.csv')"....all csv files have the same header – Vincent K Nov 06 '17 at 11:01