14

I have a log file that is formatted as a CSV with no headers. The first column is basically the unique identifier for the issues being recorded. There may be multiple lines with different details for the same issue identifier. I would like to remove lines where the first column is duplicated because I don't need the other data at this time.

I have fairly basic knowledge of PowerShell at this point, so I'm sure there's something simple I'm missing.

I'm sorry if this is a duplicate, but I could find questions to answer some parts of the question, but not the question as a whole.

So far, my best guess is:

Import-Csv $outFile | % { Select-Object -Index 1 -Unique } | Out-File $outFile -Append

But this gives me the error:

Import-Csv : The member "LB" is already present. At C:\Users\jnurczyk\Desktop\Scratch\POImport\getPOImport.ps1:6 char:1 + Import-Csv $outFile | % { Select-Object -InputObject $_ -Index 1 -Unique } | Out ... + ~~~~~~~~~~~~~~~~~~~ + CategoryInfo : NotSpecified: (:) [Import-Csv], ExtendedTypeSystemException + FullyQualifiedErrorId : AlreadyPresentPSMemberInfoInternalCollectionAdd,Microsoft.PowerShell.Commands.ImportCsvCommand

Joshua Nurczyk
  • 325
  • 2
  • 4
  • 14

3 Answers3

30

Because your data has no headers, you need to specify the headers in your Import-Csv cmdlet. And then to select only unique records using the first column, you need to specify that in the Select-Object cmdlet. See code below:

Import-Csv $outFile -Header A,B,C | Select-Object -Unique A

To clarify, the headers in my example are A, B, and C. This works if you know how many columns there are. If you have too few headers, then columns are dropped. If you have too many headers, then they become empty fields.

Dave F
  • 1,837
  • 15
  • 20
Benjamin Hubbard
  • 2,797
  • 22
  • 28
  • I tried something similar, but couldn't figure out where to put the header name (A in your example). The only issue I have is that it appends a bunch of spaces to the end of each line when I output to a file. While annoying, I don't mind this much. – Joshua Nurczyk Dec 11 '13 at 19:31
  • 1
    Try using `trim()` to remove spaces. – Benjamin Hubbard Dec 11 '13 at 19:35
  • trim() doesn't work on non-string objects. Might figure out how to make it interpret the output of Select-Object as a string. – Joshua Nurczyk Dec 11 '13 at 19:52
  • Not sure where your whitespace is coming from, but check this out: http://stackoverflow.com/questions/17180955/trim-object-contents-in-csv-import – Benjamin Hubbard Dec 11 '13 at 20:26
3

Every time I look for a solution to this issue I run across this thread. However the solution accepted here is more generic that I would like. The function below Increments each time it sees the same header name: A, B, C, A1 D, A2, C1 etc.

Function Import-CSVCustom ($csvTemp) {
    $StreamReader = New-Object System.IO.StreamReader -Arg $csvTemp
    [array]$Headers = $StreamReader.ReadLine() -Split "," | % { "$_".Trim() } | ? { $_ }
    $StreamReader.Close()

    $a=@{}; $Headers = $headers|%{
        if($a.$_.count) {"$_$($a.$_.count)"} else {$_}
        $a.$_ += @($_)
    }

    Import-Csv $csvTemp -Header $Headers
}
  • Just mentioning that `System.IO.StreamReader` requires .NET. – plaes Mar 06 '17 at 12:20
  • 1
    Just a heads up, this solution isn't compliant with RFC-4180. It will break if the headers are escaped and contain commas. tools.ietf.org/html/rfc4180 – Joe the Coder Dec 04 '17 at 16:00
0

To expand upon Benjamin Hubbard's post here is a little Sql Script (assuming that you will be inserting this data into a table in a database of course!) I use to create the header property in my script:

SELECT
        '-Header '
            + STUFF((SELECT
                    ',' + QUOTENAME(COLUMN_NAME, '"')
                    + CASE WHEN C.ORDINAL_POSITION % 5 = 0 THEN ' `' + CHAR(13) + CHAR(10) ELSE '' END
                FROM 
                    INFORMATION_SCHEMA.COLUMNS C
                WHERE
                    TABLE_NAME = '<Staging Table Name>'
            FOR XML PATH (''), type).value('.', 'nvarchar(max)'), 1, 1, '')
Mark Kram
  • 5,672
  • 7
  • 51
  • 70