3

I have multiple *.csv files. I want to concatenate them into a single CSV file in a powershell script. All csv files have the same header (the first line), so when I concatenate them I want to keep the first line only from the first file.

How can I do that?

Octavian Guzu
  • 57
  • 1
  • 6

2 Answers2

3

Note: The solution in this answer intentionally uses plain-text processing to process the files, for two reasons:

  • Use of Import-Csv and Export-Csv incurs significant processing overhead (though that may not matter in a given situation); plain-text processing is significantly faster.

  • In Windows PowerShell and PowerShell [Core] 6.x, the output will invariably have double-quoted column values, even if they weren't initially (though that should normally not matter).

    • In PowerShell [Core] 7.0+ Export-Csv and ConvertTo-Csv now have a -UseQuotes parameter that allows you to control quoting in the output.

That said, Import-Csv and Export-Csv are certainly the better choice whenever you need to read and interpret the data (as opposed to just copying it elsewhere) - see Sid's helpful answer.


# The single output file.
# Note: Best to save this in a different folder than the input
#       folder, in case you need to run multiple times.
$outFile = 'outdir/out.csv'

# Get all input CSV files as an array of file-info objects,
# from the current dir. in this example
$inFiles = @(Get-ChildItem -Filter *.csv)

# Extract the header line (column names) from the first input file
# and write it to the output file.
Get-Content $inFiles[0] -First 1 | Set-Content -Encoding Utf8 $outFile

# Process all input files and append their *data* rows to the
# output file (that is, skip the header row).
# NOTE: If you only wanted to extract a given count $count of data rows
#       from each file, add -First ($count+1) to the Get-Content call.
foreach ($file in $inFiles) {
  Get-Content $_.FullName | Select-Object -Skip 1 | 
    Add-Content -Encoding Utf8 $outFile 
}

Note the use of -Encoding Utf8 as an example; adjust as needed; by default, Set-Content will use "ANSI" encoding in Windows PowerShell, and BOM-less UTF-8 in PowerShell Core.

Caveat: By doing line-by-line plain-text processing, you're relying on each text line representing a single CSV data row; this is typically true, but doesn't have to be.

Conversely, if performance is paramount, the plain-text approach above could be made significantly faster with direct use of .NET methods such as [IO.File]::ReadLines() or, if the files are small enough, even [IO.File]::ReadAllLines().

mklement0
  • 382,024
  • 64
  • 607
  • 775
2

You could have done like this:

(Get-ChildItem -Path $path -Filter *.csv).FullName | Import-Csv | Export-Csv $path\concatenated.csv -NoTypeInformation

Where $path is the folder where the csv files exist. The final csv file will be in the same folder.

Sid
  • 2,586
  • 1
  • 11
  • 22
  • If performance and preserving the quoting status of the fields aren't a concern, this is certainly the most convenient option. I suggest mentioning that explicit use of `-Encoding` may be needed on output, given that (a) the input encoding isn't preserved (it never is in PowerShell) and (b) `Export-Csv` defaults to ASCII(!) in Windows PowerShell, which can lead to data loss. As an aside: the `.FullName` workaround is fortunately no longer needed in PowerShell [Core] v6.2.3+. – mklement0 Feb 11 '20 at 15:11