1

This is my first post here - my apologies in advance if I didn't follow a certain etiquette for posting. I'm a newbie to powershell, but I'm hoping someone can help me figure something out. I'm using the following powershell script to tell me the total count of rows in a CSV file, minus the header. This generated into a text file.

$x = (Get-Content -Path "C:\mysql\out_data\18*.csv" | Measure-Object -Line).Lines
$logfile = "C:\temp\MyLog.txt"
$files = get-childitem "C:\mysql\out_data\18*.csv"
foreach($file in $files)
 {
  $x--
  "File: $($file.name) Count: $x" | out-file $logfile -Append
 }

I am doing this for 10 individual files. But there is just ONE file that keeps adding exactly 807 more rows to the actual count. For example, for the code above, the actual row count (minus the header) in the file is 25,083. But my script above generates 25,890 as the count. I've tried running this for different iterations of the same type of file (same data, different days), but it keeps adding exactly 807 to the row count.

Even when running only (Get-Content -Path "C:\mysql\out_data\18*.csv" | Measure-Object -Line).Lines, I still see the wrong record count in the powershell window.

I'm suspicious that there may be a problem specifically with the csv file itself? I'm coming to that conclusion since 9 out of 10 files generate the correct row count. Thank you advance for your time.

CPG
  • 17
  • 5

2 Answers2

1

To measure the items in a csv you should use Import-Csv rather than Get-Content. This way you don't have to worry about headers or empty lines.

(Import-Csv -Path $csvfile | Measure-Object).Count
Doug Maurer
  • 8,090
  • 3
  • 12
  • 13
  • 1
    It also deals correctly with rows that span more than one line (newlines in a double-quoted field). – mklement0 Nov 17 '20 at 20:57
  • 1
    This worked out BEATIFULLY! I replaced the Get-Content with Import-Csv and I was able to obtain the correct count on the rows. It even helped me eliminate the decrement x$--, which is awesome. Thanks so much! – CPG Nov 17 '20 at 23:59
  • Awesome! Please consider accepting it as the answer for the sake of future visitors – Doug Maurer Nov 18 '20 at 00:13
0

It's definitely possible there's a problem with that csv file. Also, note that if the csv has cells that include linebreaks that will confuse Get-Content so also try Import-CSV

I'd start with this

$PathToQuestionableFile = "c:\somefile.csv"
$TestContents = Get-Content -Path $PathToQuestionableFile
Write-Host "`n-------`nUsing Get-Content:"
$TestContents.count
$TestContents[0..10]
$TestCsv = Import-CSV -Path $PathToQuestionableFile
Write-Host "`n-------`nUsing Import-CSV:"
$TestCsv.count
$TestCsv[0..10] | Format-Table

That will let you see what Get-Content is pulling so you can narrow down where the problem is.

If it is in the file itself and using Import-CSV doesn't fix it I'd try using Notepad++ to check both the encoding and the line endings

  • encoding is a drop down menu, compare to the other csv files
  • line endings can be seen with (View > Show Symbol > Show All Characters). They should be consistent across the file, and should be one of these
    • CR (typically if it came from a mac)
    • LF (typically if it came from *nix or the internet)
    • CRLF (typically if it came from windows)
Chris Rudd
  • 709
  • 7
  • 13
  • Thank you for this information regarding opening the csv on Notepadd ++. LF were the culprit in some records because there is a column where there are "notes", which causes it to force a line break in some instanaces. I appreciate the feedback, super helpful! – CPG Nov 17 '20 at 23:58