1

I am currently working on a small PowerShell script that is supposed to split a CSV file with translations into individual files for the respective language. For this I use the Import-Csv cmdlet to import the source file, which has about this format:

ResourceId;GermanTranslation;EnglishTranslation;Check
0;Deutscher Text;English text;OK
1;  mit Leerzeichen  ;  with spaces  ;OK

The goal is to get a line-by-line representation of the translations in the format ResourceId|EnglishTranslation|. For this I have built the following script:

Set-Variable SOURCE_FILE -Option Constant -Value ".\sourceFile.csv"
Set-Variable RESULT_FILE -Option Constant -Value ".\resultFile.csv"

foreach ($row in (Import-Csv -Path $SOURCE_FILE -Delimiter ";")) {
    Add-Content -Path $RESULT_FILE -Value ($row.RessourceId + "|" + $row.EnglishTranslation + "|")
}

Basically, everything works as desired, but when I examined the results, I noticed that the spaces with which some elements begin were no longer present in the result:

0|English text|
1|with spaces  |

Unfortunately I didn't find a parameter in the MS documentation that addresses this problem, so I was unsure at first. After that, I took a look at RFC 4180, which describes the CSV file format more or less exactly. It states that spaces should be considered as part of a field and not ignored. Should means not must, so it may well be that there really is no option for this.

Is there a possibility to preserve the spaces without me having to parse the whole file on my own?

Community
  • 1
  • 1
mju
  • 591
  • 1
  • 6
  • 17
  • You may try the `Import-Csv` from the PowerShell version 7 as this has some more options. Another option could be to *treat* your input csv before using it to enclose cell content in quotes to preserve the spaces as needed. ;-) – Olaf Apr 09 '21 at 14:36
  • If you have control of the upstream file (which you obviously might not :-S) you'd ideally generate it with strings quoted - e.g. ```1;" mit Leerzeichen ";" with spaces ";OK``` which *does* get preserved with ```Import-Csv```. – mclayton Apr 09 '21 at 14:43
  • Even in PS 7 there is no parameter to control whitespace behaviour of `Import-Csv`. You may try a .NET class instead, see https://stackoverflow.com/a/33796861/7571258 – zett42 Apr 09 '21 at 14:52

3 Answers3

3

Here is a solution that should be more robust (and propably faster) than replacing characters from the CSV file.

It uses the .NET TextFieldParser class from Microsoft.VisualBasic assembly. It has a TrimWhiteSpace attribute which, when set to $false, preserves any leading and trailing whitespace of each field, even if the field is not enclosed in double quotation marks.

I've encapsulated the .NET code in a function named Import-CustomCsv. It supports some additional options of TextFieldParser through various parameters.

Function Import-CustomCsv {
   [CmdletBinding()]
   param (
      [Parameter(Mandatory, ValueFromPipeline)] [String] $Path,
      [String[]] $Delimiter = ',',
      [String[]] $CommentTokens = '#',
      [switch] $TrimWhiteSpace,
      [switch] $HasFieldsEnclosedInQuotes
   )

   # Load Visual Basic assembly if necessary. Seems to be required only for PS 5.
   if (-not ([System.Management.Automation.PSTypeName]'Microsoft.VisualBasic.FileIO.TextFieldParser').Type) {
      Add-Type -AssemblyName 'Microsoft.VisualBasic, Version=10.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a'
   }   

   # Create a CSV parser
   $csvParser = New-Object Microsoft.VisualBasic.FileIO.TextFieldParser $Path

   try {
      # Set CSV parser options
      $csvParser.SetDelimiters( $Delimiter )
      $csvParser.CommentTokens = $CommentTokens
      $csvParser.TrimWhiteSpace = $TrimWhiteSpace
      $csvParser.HasFieldsEnclosedInQuotes = $HasFieldsEnclosedInQuotes

      # Read the header
      $header = $csvParser.ReadFields()

      while( -not $csvParser.EndOfData ) {
         # Read current line fields, pointer moves to the next line.
         $fields = $csvParser.ReadFields()

         # Associate each field with its name from the header by storing it in an
         # ordered hashtable.
         $namedFields = [ordered]@{}
         for( $i = 0; $i -lt $header.Count; $i++ ) {
            $namedFields[ $header[ $i ] ] = $fields[ $i ]
         }

         # Finally convert fields to PSCustomObject and output (implicitly)
         [PSCustomObject] $namedFields
      }
   }
   finally {
      # Make sure to close the file even in case of an exception.
      $csvParser.Close()
   }
}

Usage example: Parse CSV, preserving whitespace:

Import-CustomCsv test.csv -Delimiter ';'

Output:

ResourceId GermanTranslation   EnglishTranslation Check
---------- -----------------   ------------------ -----
0          Deutscher Text      English text       OK
1            mit Leerzeichen     with spaces      OK

Usage example: Parse CSV, trimming whitespace (like Import-Csv):

Import-CustomCsv test.csv -Delimiter ';' -TrimWhiteSpace

Output:

ResourceId GermanTranslation EnglishTranslation Check
---------- ----------------- ------------------ -----
0          Deutscher Text    English text       OK   
1          mit Leerzeichen   with spaces        OK   

Note:

The above two samples keep field-enclosing double-quotation marks (if any) in the output. To remove double-quotation marks around fields, pass parameter -HasFieldsEnclosedInQuotes.

zett42
  • 25,437
  • 3
  • 35
  • 72
  • This is great +1 for nice improvement over native `import-csv`, should be tagged as resolved already. Sadly, the one asking the question doesn't seem to be much appreciative. – Santiago Squarzon Apr 10 '21 at 22:58
  • Wow, a really great solution with pleasantly detailed examples! This actually works. (Sorry I didn't notice the answers until now - it's now marked as solved). – mju Apr 12 '21 at 06:42
2

One way to to avoid the default trimming of text would be to convert spaces to another character first before treating it as a CSV. Not sure which character would be the best to use but here is an example using underscore:

Get-Content $SOURCE_FILE | %{
    $_ -replace ' ', '_'
} | ConvertFrom-Csv -Delimiter ';' | %{
    '{0}|{1}|' -f $_.ResourceId, $_.EnglishTranslation
} | % {$_ -replace '_', ' '} | Out-File $RESULT_FILE 
Dave Sexton
  • 10,768
  • 3
  • 42
  • 56
1

This should work as you intend. Try it and let me know.

Basically this is doing what @mclayton recommended in the comments.

$content=Get-Content ./test.csv|%{
    '"{0}"' -f ($_ -replace ';','";"')
}|ConvertFrom-Csv -Delimiter ';'


PS /~> $content


ResourceId GermanTranslation   EnglishTranslation Check
---------- -----------------   ------------------ -----
0          Deutscher Text      English text       OK
1            mit Leerzeichen     with spaces      OK
Santiago Squarzon
  • 41,465
  • 5
  • 14
  • 37