0

I have below records in a file:

 id user        mart  desc                                                                                                
 -- ----        ----- --------                                                                                                
  1 test_1          3 test result 1...
  2 test_2          3 test result 2

and I want to write it to a .csv file in powershell with below command:

Get-Content .\test | ConvertTo-Csv -NoTypeInformation | Out-File txt.csv

I am expect a csv file with below value:

id,user,mart,desc
1,test_1,3,test result 1
2,test_2,3,test result 2

but somehow the output csv file is like below:

"PSPath","PSParentPath","PSChildName","PSDrive","PSProvider","ReadCount","Length"
"C:\test","C:\question","test","C","Microsoft.PowerShell.Core\FileSystem","1","126"
"C:\test","C:\question","test","C","Microsoft.PowerShell.Core\FileSystem","2","126"
"C:\test","C:\question","test","C","Microsoft.PowerShell.Core\FileSystem","3","126"
"C:\test","C:\question","test","C","Microsoft.PowerShell.Core\FileSystem","4","126"

Is there any idea why it didnt export the proper value to csv.

I am expect a csv file with below value:

id,user,mart,desc
1,test_1,3,test result 1
2,test_2,3,test result 2
mklement0
  • 382,024
  • 64
  • 607
  • 775
Andie
  • 23
  • 4

1 Answers1

2

Background information:

  • ConvertTo-Csv and Export-Csv operate on (complex) objects and their properties, not on the strings that Get-Content returns from plain-text files.

    • [string] instances have just one property, .Length, and their content isn't a property, so converting strings to CSV is pointless, because you'll get a single output column named Length that simply contains the length (character count) of each input string; e.g.:

      'abc', 'de' | ConvertTo-Csv -NoTypeInformation
      
      • Output (the values of the only column are the lengths of the input strings):

        "Length"
        "3"
        "2"
        
    • The reason you have additional columns in your (useless) CSV output is that the strings returned by Get-Content are decorated with ETS (Extended Type System) properties such as .PSPath containing metadata about the file that the lines were read from.

      • For the sake of completeness: You could bypass these additional properties by inserting a ForEach-Object ToString segment after your Get-Content call (but it still won't get you useful CSV output).
  • Therefore, to provide meaningful input to ConvertTo-Csv / Export-Csv based on a file, that file's content must be parsed into objects with properties, which these cmdlets then convert to CSV data with columns named for the properties, and field values based on the properties's values.

    • Only text files using a structured format for programmatic processing allow such parsing to be performed robustly; examples of such formats are JSON, XML, and CSV itself.

    • By contrast, your text file uses a for-display format for the human observer, which is not suitable for robust programmatic processing; case in point: the ... at the end of the first value in the desc column suggest that the column value was truncated.


Solution:

  • In principle, if you get to control how your input file is created, create it in a structured text format, as noted.

    • Notably, do not use the for-display format that results from redirecting command output to a file with > or Out-File; neither should you use any of the Format-* cmdlets (which > and Out-File implicitly use).

    • See this answerThanks, iRon. and this one for more information.

    • Of course, if you're free to choose a structured format, you could just create your file in CSV format to begin with.

  • Otherwise you have to resort to - invariably brittle - text parsing. As noted, data may already have been lost in the for-display representation.

The following transforms your fixed-column-width input file into CSV, based on the following assumptions:

  • The start of a column name implies where all values in that column start as well.
  • No column names contain embedded spaces or embedded ,
  • No column values are multiline values.
# Read the file into header line, separator line (to be ignored, and
# data lines, ignoring blank (empty or all-whitespace) lines.
$headerLine, $separatorLine, $dataLines = @(Get-Content .\test) -match '\S'

# Get the indices of the characters that end the columns.
# + -1 adds an extra array element that is a placeholder for the end of the line.
$colEndIndices = [regex]::Matches($headerLine, ' \S').Index + -1

# Transform the header line into a CSV header row.
$csvHeader = (-split $headerLine) -join ','

# Transform the data lines into in-memory CSV rows.
$csvData =
  $dataLines |
  ForEach-Object {
    # Split the line at hand into fields, trim each field and enclose it in "..."
    $pos = 0
    $fields = 
      foreach ($colEndIndex in $colEndIndices) {
        if ($colEndIndex -eq -1) { $colEndIndex = $_.Length - 1 } 
        '"' + $_.Substring($pos, $colEndIndex - $pos + 1).Replace('"', '""').Trim() + '"'
        $pos += $colEndIndex - $pos + 1
      }
    # Output the fields as a CSV line
    $fields -join ','
  }

# Save the in-memory CSV data to a file.
$csvHeader, $csvData | Set-Content txt.csv

Given your sample input file, the above saves the following content to txt.csv:

id,user,mart,desc
"1","test_1","3","test result 1..."
"2","test_2","3","test result 2"

The unconditional enclosure of the fields in "..." simplifies the code, but it wouldn't be hard to modify it to use double-quoting only when actually needed, namely for field values that contain , and newlines (n/a here).

To CSV parsers it shouldn't matter whether fields that contain neither are optionally enclosed in "..." or not.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • In general, I agree 10000% on the for-display vs structured data discussion. In this particular case, though, it appears the data happens to also be a fixed-width format that _may_ be readable. – Joel Coehoorn May 30 '23 at 19:50
  • @JoelCoehoorn, yes, fixed-width data _can_ be parseable, if it is unambiguous (and the solution in the answer does so), but note that the specific format of the sample data suggests that PowerShell's own `Format-Table` cmdlet was (possibly implicitly) used to create the file, and, if so, that the presence of `...` in a column indicates a _truncated_ value. – mklement0 May 30 '23 at 19:59