How to preserve spaces after a CSV import using a powershell script?

Question

I am currently working on a small PowerShell script that is supposed to split a CSV file with translations into individual files for the respective language. For this I use the Import-Csv cmdlet to import the source file, which has about this format:

ResourceId;GermanTranslation;EnglishTranslation;Check
0;Deutscher Text;English text;OK
1;  mit Leerzeichen  ;  with spaces  ;OK

The goal is to get a line-by-line representation of the translations in the format ResourceId|EnglishTranslation|. For this I have built the following script:

Set-Variable SOURCE_FILE -Option Constant -Value ".\sourceFile.csv"
Set-Variable RESULT_FILE -Option Constant -Value ".\resultFile.csv"

foreach ($row in (Import-Csv -Path $SOURCE_FILE -Delimiter ";")) {
    Add-Content -Path $RESULT_FILE -Value ($row.RessourceId + "|" + $row.EnglishTranslation + "|")
}

Basically, everything works as desired, but when I examined the results, I noticed that the spaces with which some elements begin were no longer present in the result:

0|English text|
1|with spaces  |

Unfortunately I didn't find a parameter in the MS documentation that addresses this problem, so I was unsure at first. After that, I took a look at RFC 4180, which describes the CSV file format more or less exactly. It states that spaces should be considered as part of a field and not ignored. Should means not must, so it may well be that there really is no option for this.

Is there a possibility to preserve the spaces without me having to parse the whole file on my own?

You may try the `Import-Csv` from the PowerShell version 7 as this has some more options. Another option could be to *treat* your input csv before using it to enclose cell content in quotes to preserve the spaces as needed. ;-) — Olaf, Apr 09 '21 at 14:36
If you have control of the upstream file (which you obviously might not :-S) you'd ideally generate it with strings quoted - e.g. ```1;" mit Leerzeichen ";" with spaces ";OK``` which *does* get preserved with ```Import-Csv```. — mclayton, Apr 09 '21 at 14:43
Even in PS 7 there is no parameter to control whitespace behaviour of `Import-Csv`. You may try a .NET class instead, see https://stackoverflow.com/a/33796861/7571258 — zett42, Apr 09 '21 at 14:52

zett42 · Accepted Answer · 2021-04-11T13:23:46.123

Here is a solution that should be more robust (and propably faster) than replacing characters from the CSV file.

It uses the .NET TextFieldParser class from Microsoft.VisualBasic assembly. It has a TrimWhiteSpace attribute which, when set to $false, preserves any leading and trailing whitespace of each field, even if the field is not enclosed in double quotation marks.

I've encapsulated the .NET code in a function named Import-CustomCsv. It supports some additional options of TextFieldParser through various parameters.

Function Import-CustomCsv {
   [CmdletBinding()]
   param (
      [Parameter(Mandatory, ValueFromPipeline)] [String] $Path,
      [String[]] $Delimiter = ',',
      [String[]] $CommentTokens = '#',
      [switch] $TrimWhiteSpace,
      [switch] $HasFieldsEnclosedInQuotes
   )

   # Load Visual Basic assembly if necessary. Seems to be required only for PS 5.
   if (-not ([System.Management.Automation.PSTypeName]'Microsoft.VisualBasic.FileIO.TextFieldParser').Type) {
      Add-Type -AssemblyName 'Microsoft.VisualBasic, Version=10.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'
   }   

   # Create a CSV parser
   $csvParser = New-Object Microsoft.VisualBasic.FileIO.TextFieldParser $Path

   try {
      # Set CSV parser options
      $csvParser.SetDelimiters( $Delimiter )
      $csvParser.CommentTokens = $CommentTokens
      $csvParser.TrimWhiteSpace = $TrimWhiteSpace
      $csvParser.HasFieldsEnclosedInQuotes = $HasFieldsEnclosedInQuotes

      # Read the header
      $header = $csvParser.ReadFields()

      while( -not $csvParser.EndOfData ) {
         # Read current line fields, pointer moves to the next line.
         $fields = $csvParser.ReadFields()

         # Associate each field with its name from the header by storing it in an
         # ordered hashtable.
         $namedFields = [ordered]@{}
         for( $i = 0; $i -lt $header.Count; $i++ ) {
            $namedFields[ $header[ $i ] ] = $fields[ $i ]
         }

         # Finally convert fields to PSCustomObject and output (implicitly)
         [PSCustomObject] $namedFields
      }
   }
   finally {
      # Make sure to close the file even in case of an exception.
      $csvParser.Close()
   }
}

Usage example: Parse CSV, preserving whitespace:

Import-CustomCsv test.csv -Delimiter ';'

Output:

ResourceId GermanTranslation   EnglishTranslation Check
---------- -----------------   ------------------ -----
0          Deutscher Text      English text       OK
1            mit Leerzeichen     with spaces      OK

Usage example: Parse CSV, trimming whitespace (like Import-Csv):

Import-CustomCsv test.csv -Delimiter ';' -TrimWhiteSpace

Output:

ResourceId GermanTranslation EnglishTranslation Check
---------- ----------------- ------------------ -----
0          Deutscher Text    English text       OK   
1          mit Leerzeichen   with spaces        OK

Note:

The above two samples keep field-enclosing double-quotation marks (if any) in the output. To remove double-quotation marks around fields, pass parameter -HasFieldsEnclosedInQuotes.

This is great +1 for nice improvement over native `import-csv`, should be tagged as resolved already. Sadly, the one asking the question doesn't seem to be much appreciative. — Santiago Squarzon, Apr 10 '21 at 22:58
Wow, a really great solution with pleasantly detailed examples! This actually works. (Sorry I didn't notice the answers until now - it's now marked as solved). — mju, Apr 12 '21 at 06:42

score 2 · Answer 2 · answered Apr 09 '21 at 14:52

One way to to avoid the default trimming of text would be to convert spaces to another character first before treating it as a CSV. Not sure which character would be the best to use but here is an example using underscore:

Get-Content $SOURCE_FILE | %{
    $_ -replace ' ', '_'
} | ConvertFrom-Csv -Delimiter ';' | %{
    '{0}|{1}|' -f $_.ResourceId, $_.EnglishTranslation
} | % {$_ -replace '_', ' '} | Out-File $RESULT_FILE

score 1 · Answer 3 · answered Apr 09 '21 at 14:50

This should work as you intend. Try it and let me know.

Basically this is doing what @mclayton recommended in the comments.

$content=Get-Content ./test.csv|%{
    '"{0}"' -f ($_ -replace ';','";"')
}|ConvertFrom-Csv -Delimiter ';'


PS /~> $content


ResourceId GermanTranslation   EnglishTranslation Check
---------- -----------------   ------------------ -----
0          Deutscher Text      English text       OK
1            mit Leerzeichen     with spaces      OK

How to preserve spaces after a CSV import using a powershell script?

3 Answers3

Linked