0

In my understanding, (Get-Content $file | Measure-Object).count should return the total number of lines in the file. It does for some files, but for some generated files the result it gives is less than the correct number by 1. I examines them and see that at the last line they don't have CRLF.

Why is this the case? And how to make sure that I get the correct result? The Measure-Object documentation doesn't seem to explain this.

Ooker
  • 1,969
  • 4
  • 28
  • 58

1 Answers1

2

The behavior is unrelated to Measure-Object:

It is Get-Content that reads and streams a given file's lines one by one, stripped of its trailing newline, if present on the last line.

That is, to Get-Content it doesn't make a difference whether the last line has a trailing newline, i.e. whether the file ends with a newline or not.

You need a different approach if you want to count the actual number of newlines (LF or CRLF sequences) in a file (and possibly add 1 if you want to consider the final newline a line in its own right), e.g.:

# Count the number of newlines
[regex]::Matches((Get-Content -Raw $file), '\n').Count

Alternatively:

((Get-Content -Raw $file) -replace '[^\n]').Length

Note the use of -Raw with Get-Content, which reads the file as a whole into a single, (typically) multiline string, thereby preserving all newlines.

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 1
    I think what make me confused here is the difference between how this cmdlet defines a line and how editors like Notepad++ show me "lines". I think `Get-Content` counts the number of non-CRLF strings separated by CRLF, while editors will move the cursor down a line whenever a CRLF appears. So for a text having a CRLF at the end, then the number of lines shown in an editor will be larger by 1 than the number of non-CRLF strings separated by CRLF. I think `Get-Content` has a better definition of line – Ooker Aug 26 '23 at 12:43
  • 1
    @Ooker - I guess it depends on whether you consider an empty string as a line of text - if the file consists entirely of just a ```CRLF``` the empty string *before* the ```CRLF``` will presumably count as a line, so why not the empty string *after* as well? And by extension, does a 0 byte file have 0 or 1 line? (Rhetorical questions, btw - I don’t think either way is objectively “right” - just depends on what you’re doing at the time:-)) – mclayton Aug 26 '23 at 20:06
  • 1
    @mclayton, good points. To POSIX, a newline is a _mandatory_ line _terminator_. Any nonempty run of non-newline characters at the end of a file is an _incomplete line_. Thus, there is no ambiguity: The number of newlines equals the number of lines, and `wc -l` simply reports the _count of newlines_, which means that an incomplete line is _not_ counted. A `0`-byte file as well as one consisting _only_ of an incomplete line therefore has a line count of `0`. – mklement0 Aug 28 '23 at 21:09
  • 1
    To PowerShell, by contrast, a newline is an _optional_ line terminator, meaning that the last line may - but needn't be - terminated with a newline. De facto, because `Get-Content` sends _nothing_ through the pipeline for a `0`-byte file, `Measure-Object` reports a line count of `0` in that case, but _does_ count lines that are incomplete in a POSIX sense, because `Get-Content` reports incomplete lines just like complete ones. See [GitHub issue #3911](https://github.com/PowerShell/PowerShell/issues/3911) for a discussion of how `Get-Content -Raw` should handle `0`-byte files. – mklement0 Aug 28 '23 at 21:09