1

I'm using PowerShell to run and export a SQL query. By default my output file has " " around the values, but one of the requirements is to only return raw data, so I piped in | % {$_ -replace '"',""} | to address that, however one of the fields is a text field that contains measurements that have " for length I need to maintain.

My questions is how to apply | % {$_ -replace '"',""} | to everything BUT one field.

theHydra
  • 21
  • 4
  • 1
    Without current and desired sample data it is more challenging to help you. – TheMadTechnician May 03 '23 at 19:49
  • 1
    I still want to remove the double quoting surrounding the results, for ALL fields, but want to retain the double quoting if it is part of the data results. "MOD70103- 24" X 16" PP2 54W" X 40D" CLR Black" should return MOD70103- 24" X 16" PP2 54W" X 40D" CLR Black – theHydra May 03 '23 at 19:58

2 Answers2

0

Note:

  • In PowerShell (Core) 7+ the Export-Csv and ConvertTo-Csv cmdlets now have -UseQuotes and -QuoteFields parameters, which would allow you to create your CSV data without double-quoted fields to begin with; e.g.:

    ... | Export-Csv out.csv -UseQuotes Never
    
  • In Windows PowerShell these cmdlets invariably output double-quoted fields, necessitating manual post-processing, as shown below.


Try the following:

... | % { $_ -replace '(?<=^|",)"|"(?=,"|$)' -replace '""', '"' }

Note:

  • The assumption is that fields with embedded " values do not also contain ,, because the removal of the enclosing double quotes would then break the partitioning of the line into fields.

  • The extra -replace operation, -replace '""', '"' unescapes the field-embedded " chars.

    • Note that a well-formed CSV requires " chars. inside double-quoted fields to be escaped as "".
      Export-Csv / ConvertTo-Csv do perform this escaping.

    • However, Import-Csv / ConvertFrom-Csv also tolerate input without this kind of escaping.

    • Either way, the above command works: if the embedded " chars. aren't escaped as "", -replace '""', '"' is simply a no-op.

A simple, self-contained example:

@'
"foo","bar"
"baz","qu""ux"
'@ -split '\r?\n' | 
  % { $_ -replace '(?<=^|",)"|"(?=,"|$)' -replace '""', '"' }

Output (only the field internal " was retained):

foo,bar
baz,qu"ux

For an explanation of the regex and the ability to experiment with it, see this regex101.com page - note that, because the linked page uses C# string-literal syntax, the " chars. in the regex are doubled.


Note that you can speed up processing considerably if you load your entire CSV file as a whole into memory and modify your regex with the Multiline regex option ((?m)), which then causes ^ and $ to match the start and end of each line:

(Get-Content -Raw your.csv) -replace '(?m)(?<=^|",)"|"(?=,"|$)' -replace '""', '"'
mklement0
  • 382,024
  • 64
  • 607
  • 775
-1

If I had a string content as similar to a CSV file content with 3 columns: Col1, Col2, Col3 I would write something like this:

$csvSimilarContent = @"
Col1,Col2,Col3
"AAA","MOD70103- 24`" X 16`" PP2 54W`" X 40D`" CLR Black","CCC"
"@

# takes a string that contains CSV data as input and creates a 
# custom object for each row in the CSV data. 
$content = $csvSimilarContent | ConvertFrom-Csv

# Create an empty array for CSV content
$modifiedContent = @()

# Iterate through each row in the CSV file
foreach ($row in $content) {
    # Remove surrounding double quotes in fields
    $row.Col1 = $row.Col1 -replace '^"|"$', ''
    $row.Col2 = $row.Col2 -replace '^"|"$', ''
    $row.Col3 = $row.Col3 -replace '^"|"$', ''

    # Add the modified row to the modified content array
    $modifiedContent += $row
}

# Display the modified content
$modifiedContent

The script removes only double quotes, that surround the fields.

Note: Extending arrays in a loop with += is inefficient

  • 1
    Note that the double quotes with _syntactic_ function, i.e.. those _enclosing_ field values are _removed_ when you use `Import-Csv` to parse a CSV file; that is, the property values of the resulting objects do _not_ contain these double quotes, so there is no point in trying to replace them. – mklement0 May 03 '23 at 21:20
  • 1
    If I read the question correctly, the intent is to remove the field-enclosing double quotes _from the text in the file_. In Windows PowerShell, `Export-Csv` invariably produces CSV files with fields enclosed in double quotes. While that _shouldn't_ be a problem for compliant CSV parsers (and it isn't for `Import-Csv`), situationally a file's consumer may choke on them. – mklement0 May 03 '23 at 21:20
  • 1
    As an aside: Extending arrays in a loop with `+=` is inefficient, because a _new_ array must be created behind the scenes _in every iteration_, given that arrays are of fixed size; a much more efficient approach is to use a `foreach` loop as an _expression_ and let PowerShell itself collect the outputs in an array: `[array] $outputs = foreach (...) { ... }` - see [this answer](https://stackoverflow.com/a/60708579/45375). In case you need to create arrays manually, e.g. to create _multiple_ ones, use an efficiently extensible list type - see [here](https://stackoverflow.com/a/60029146/45375). – mklement0 May 03 '23 at 21:21
  • Yes - of course - enclosing field values are removed when you using Import-Csv to parse a CSV file. – piotreks-uw-edu May 03 '23 at 21:37
  • So why are you trying to remove double quotes that are no longer there? – mklement0 May 03 '23 at 21:44
  • In your updated answer the `-replace` statements still aren't doing anything. – mklement0 May 03 '23 at 22:16
  • Also, instead of saying "this is inefficient", I suggest avoiding the ill-advised practice to begin with. – mklement0 May 03 '23 at 22:17