1

I'm trying to get the line count of a csv file. I ran this code to get it:

$csvPath = "some.csv"
$lineCount = "linecount.csv"
Import-Csv $csvPath | Select-Object -Skip 1 | Measure-Object | Export-CSV $lineCount -NoTypeInformation

This is the output I get in linecount.csv:

"Count","Average","Sum","Maximum","Minimum","Property"
"100",,,,,

This is my desired output:

"Count"
"100"

How can I achieve this?

mklement0
  • 382,024
  • 64
  • 607
  • 775
noobCoder
  • 89
  • 7
  • 2
    Change `Import-Csv` to `Get-Content`. I wouldn't recommend this method for a very large file tho, [this method](https://stackoverflow.com/a/76419378/15339544) would be the more efficient and recommended one. – Santiago Squarzon Jul 07 '23 at 21:08
  • 2
    Why are you skipping the first record with `Select-Object -Skip 1`? `Import-Csv` already eats the header, so your "line count" will be off by (at least) 2 – Mathias R. Jessen Jul 07 '23 at 21:10
  • You're right, Mathias! Thank you. I removed the `-Skip 1` and it's outputting correctly the number after I reviewed the line count in the CSV and in the line count report. – noobCoder Jul 08 '23 at 03:18

1 Answers1

4

To select a subset of the properties from an object / each object in a collection, pipe them to Select-Object and pass the property names of interest to the -Property parameter.

Therefore, to get only the Count property (using a simplified example):

# To pass *multiple* property names, separate them with ","
# Since -Property is the first positional parameter, you may omit its name.
1..10 | Measure-Object | Select-Object -Property Count

Note:

  • For each input object, Select-Object outputs a [pscustomobject] instance that contains the specified properties.[1]

  • In addition to selecting existing properties as-is, you may define calculated properties, which can be used to define properties with values derived from the input objects' property values, to rename properties, or to add entirely new properties.

  • If you pipe the above to Export-Csv, you'll get the desired CSV output; to verify this in memory, pipe to ConvertTo-Csv instead.

  • By contrast, if you merely want the value of the Count property (rather than an object with a Count property), you can use -ExpandProperty instead of -Property or pipe to | ForEach-Object Count instead; however the following is more concise and efficient:[2]

    (1..10 | Measure-Object).Count   # -> 10
    

As for you what you're counting:

  • If you want to determine the the number of data lines in your input CSV files - excluding the header line, use plain-text processing via Get-Content instead of Import-Csv

    Get-Content $csvPath | Select-Object -Skip 1 | Measure-Object | 
      Select-Object Count | Export-CSV $lineCount -NoTypeInformation
    
    • Caveats:
      • CSV rows can span more than line, so this method will only report the correct count if all rows, including the header row, occupy exactly one line each (while that is typical, it isn't guaranteed).

      • As Santiago points out, this method can perform poorly with large input files, in which case you're better off using the approach shown in this answer.

  • If you want the number of data rows in your input CSV file, Import-Csv is all you need: no need for Select-Object -Skip 1, because what Import-Object returns are objects representing just the data rows - the header information is "baked into" each such object, in the form of the object's property names:

    Import-Csv $csvPath | Measure-Object | 
      Select-Object Count | Export-CSV $lineCount -NoTypeInformation
    
    • Note:

      • While this is the correct and robust way to count data rows in a CSV file, it is even slower than the Get-Content method.

[1] In an effort to reflect the type of the objects that the properties were taken from, the output objects are assigned an additional type name, via PowerShell's ETS (Extended Type System), formed by prepending Selected. to the full type name of the input objects. In this example: Selected.Microsoft.PowerShell.Commands.GenericMeasureInfo. This ETS type name surfaces when you pass such instances to Get-Member, for instance, and also as the first entry of the instrinsic .pstypenames property value and in the type-identifying comment added to CSV output by Export-Csv / ConvertTo-Csv if -NoTypeInformation is not used in Windows PowerShell / if -IncludeTypeInformation is used in PowerShell (Core) 7+.

[2] This even works with multiple input objects, courtesy of PowerShell's member-access enumeration feature. However, this requires collecting all input objects in memory, up front. While this generally won't be a problem, it could be with large input collections where processing the property values one by one must be performed; in that case, use the Select-Object or ForEach-Object techniques.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Thank you for the in depth explanation! I'm still learning Powershell so this was really insightful and educational. – noobCoder Jul 08 '23 at 03:17
  • I'm glad to hear it, @noobCoder; my pleasure. – mklement0 Jul 08 '23 at 04:46
  • Another way would be to specify the `Count` member. `(1..10 | Measure-Object).Count` – lit Jul 08 '23 at 16:59
  • Thanks, @lit - yes, if you want only the _value_ of the `.Count` property , that's the best approach. Note that the premise of the question is to get an _object with a `.Count` property_; however, I've added the `(...).Count` technique to the answer as well, along with an explanation of how it differs. – mklement0 Jul 08 '23 at 17:45