0

I'm working with a script that opens up a csv, splits it into multiple smaller csvs than compresses the files so they can be easily uploaded. The script does a fine job at this, but I need a way to escape double quotes (") so they are treated as text. I've included the part of the script that does the splitting below. Thanks!

param([string]$in_file, [string]$out_folder, [int]$linesperFile = 100000, [string]$ext = 'csv', [string]$header='Y', [string]$compress='Y')

Function Split-File{
    Param(
        $in_file,
        $out_folder,
        $linesperFile,
        $ext,
        $header
        )
    $filecount = 1
    $reader = $null
    $header_row = $null
    try{
        $reader = [io.file]::OpenText($in_file)
        if ($header -ne 'N') {
            $header_row = $reader.ReadLine()
        }
        try{
            $writer = [io.file]::CreateText("{0}{1}.{2}" -f ($out_folder,$filecount.ToString("000"),$ext))
            $filecount++
            $linecount = 0
            
            while($reader.EndOfStream -ne $true) {
                if ($header -eq 'Y') {
                    $writer.WriteLine($header_row)
                }
                while( ($linecount -lt $linesperFile) -and ($reader.EndOfStream -ne $true)){
                    $writer.WriteLine($reader.ReadLine());
                    $linecount++
                }
    
                if($reader.EndOfStream -ne $true) {
                    $writer.Dispose();
                    $writer = [io.file]::CreateText("{0}{1}.{2}" -f ($out_folder,$filecount.ToString("000"),$ext))
                    $filecount++
                    $linecount = 0
                }
            }
        } finally {
            $writer.Dispose();
        }
    } finally {
        $reader.Dispose();
    }
    return $filecount

I attempted iRon's solution and it still isn't working... It reaches one of the lines with the double quotes and then just starts repeating the same line over and over.

enter image description here

The data below is supposed to be a single row of data enter image description here

enter image description here

  • 1
    Why not using the [Convert*-Csv](https://learn.microsoft.com/powershell/module/microsoft.powershell.utility/convertto-csv) cmdlets and the [steppable pipeline](https://devblogs.microsoft.com/powershell-community/mastering-the-steppable-pipeline/#the-steppable-pipeline)? See also: [Trouble Splitting a 9 GB csv file via Powershell](https://stackoverflow.com/a/75382747/1701026) – iRon Jul 13 '23 at 18:27
  • As an aside: it's better to represent Boolean values (`[bool]`) as `$true` and `$false`, and to implement Boolean _parameters_ as `[switch]` parameters, where the presence of the switch (e.g., `-compress`) implies `$true`, and the absence `$false`. – mklement0 Jul 13 '23 at 18:32
  • @mklement0 The existence of double quotes in the csv is causing split at points it should not, stopping the current line/row of data immediately and continuing with that data on the next line which throws everything off. – PracticingPython Jul 13 '23 at 20:32
  • So you're saying that some of your CSV data rows _span more than one line_, correct? If so, iRon's suggestion is the simplest solution, though it won't be fast. If you need a plain-text solution for performance: are _all_ fields quoted in the input file? If there are any field-internal `"` chars., how are they escaped? (`""` vs. ``\"``) – mklement0 Jul 13 '23 at 20:42
  • 1
    I meant CSV _source data_. On a general note, please (also) post your code, data, error messages as _text_, [not (just) as _images_](https://meta.stackoverflow.com/a/285557/45375). – mklement0 Jul 13 '23 at 22:01
  • I suspect that your input `csv` isn't (**RFC 4180 and MIME standards**) valid by itself ([*Fields containing a line-break, double-quote or commas should be quoted. (If they are not, the file will likely be impossible to process correctly.)*](https://en.wikipedia.org/wiki/Comma-separated_values) and likely also fails on other applications as Excel. Please add an example of the record/line with the concerned double quotes that is failing to the question. – iRon Jul 14 '23 at 08:05
  • @iRon OP Updated – PracticingPython Jul 14 '23 at 12:44
  • 1
    Please provide a minimal sample of the CSV _source data_, as _text_. – mklement0 Jul 14 '23 at 12:59
  • Please, [DO NOT post images of code, data, error messages, etc.](https://meta.stackoverflow.com/a/285557), see [How do I ask a good question?](https://stackoverflow.com/help/how-to-ask). Besides, the last image already suggests that the input data is a correct formatted `csv` (e.g. what is the meaning of the double quote in the lower left cell???) – iRon Jul 14 '23 at 15:09

0 Answers0