36

I need to retrieve the last n lines of huge files (1-4 Gb), in Windows 7. Due to corporate restrictions, I cannot run any command that is not built-in. The problem is that all solutions I found appear to read the whole file, so they are extremely slow.

Can this be accomplished, fast?

Notes:

  1. I managed to get the first n lines, fast.
  2. It is ok if I get the last n bytes. (I used this https://stackoverflow.com/a/18936628/2707864 for the first n bytes).

Solutions here Unix tail equivalent command in Windows Powershell did not work. Using -wait does not make it fast. I do not have -tail (and I do not know if it will work fast).

PS: There are quite a few related questions for head and tail, but not focused on the issue of speed. Therefore, useful or accepted answers there may not be useful here. E.g.,

Windows equivalent of the 'tail' command

CMD.EXE batch script to display last 10 lines from a txt file

Extract N lines from file using single windows command

https://serverfault.com/questions/490841/how-to-display-the-first-n-lines-of-a-command-output-in-windows-the-equivalent

powershell to get the first x MB of a file

https://superuser.com/questions/859870/windows-equivalent-of-the-head-c-command

  • Batch file is a bad choice for that, bacause it is very difficult or even almost impossible to handle binary files correctly (I suppose you are talking about such as you want to extract a certain amount of _bytes_ rather than characters or lines); so I would definitely go for PS... – aschipfl Apr 08 '16 at 20:17
  • @aschipfl: batch files are much simpler & faster than PS – Zimba Oct 12 '21 at 07:49
  • @sancho: as a matter of interest, could you share your solution for reading the first n lines of a big file? I want to view the first couple of "lines" of a binary file that contains some text, but don't want to read the whole thing in... – Diomedea Aug 24 '22 at 15:40
  • 1
    @Diomedea - This is an old question. I am not sure I keep that old version, and I wouldn't know at the moment where to look. My apologies. – sancho.s ReinstateMonicaCellio Aug 24 '22 at 16:22

6 Answers6

114

If you have PowerShell 3 or higher, you can use the -Tail parameter for Get-Content to get the last n lines.

Get-content -tail 5 PATH_TO_FILE;

On a 34MB text file on my local SSD, this returned in 1 millisecond vs. 8.5 seconds for get-content |select -last 5

alroc
  • 27,574
  • 6
  • 51
  • 97
  • 1
  • 1
    Then get your environment upgraded to a recent release of PowerShell. Unless you have some weird compatibility issues that need to be preserved, there's no reason to not upgrade to at least v3, preferably 4 or 5 (whatever the highest one your systems support is). – alroc Apr 09 '16 at 13:26
  • 1
    Due to the same corporate restrictions that I cannot run any command that is not built-in, I cannot upgrade either. I get what they give me. – sancho.s ReinstateMonicaCellio Apr 10 '16 at 05:41
  • Then your corporate IT environment is broken and I'd recommend looking for someplace that at least *attempts* to stay current on its software. – alroc Apr 10 '16 at 11:34
  • 1
    They may be "broken", or overburdened with work, or... This is not uncommon in large companies, where IT takes time to update the "standard environment". It may be a nuisance, but I would not change jobs because of this, unless it becomes a serious hurdle for performing my duties. This is not the case... – sancho.s ReinstateMonicaCellio Apr 10 '16 at 14:18
  • 5
    Sorry, but if a piece of software which is considered a core windows component hasn't been considered for an upgrade in the more than 3 years since it was released, I perceive the environment as broken. What else is out of date, or even worse, left unpatched for security & bug fixes? How far can you really advance your own career and technical knowledge when you're saddled with out of date software? *That* is why you move on - because you can't improve your own skills in such an environment. – alroc Apr 10 '16 at 17:52
  • 8
    I don't work for a software company. Not having PS3 is not a symptom of a need for going elsewhere (even if it would be convenient to have it!). That is my perception. Thanks for the vividness! – sancho.s ReinstateMonicaCellio Apr 11 '16 at 08:28
  • 1
    I don't work for a software company either. But that does't mean that your corporate computing environment gets a free pass on being stuck on ancient software. Staying current on your software is part of the cost of doing business today and if they're not willing to invest there, they're probably not investing in their people or other things that are important to keeping operations running. – alroc Apr 11 '16 at 11:21
20

How about this (reads last 8 bytes for demo):

$fpath = "C:\10GBfile.dat"
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-8, 'End') | Out-Null
for ($i = 0; $i -lt 8; $i++)
{
    $fs.ReadByte()
}

UPDATE. To interpret bytes as string (but be sure to select correct encoding - here UTF8 is used):

$N = 8
$fpath = "C:\10GBfile.dat"
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-$N, [System.IO.SeekOrigin]::End) | Out-Null
$buffer = new-object Byte[] $N
$fs.Read($buffer, 0, $N) | Out-Null
$fs.Close()
[System.Text.Encoding]::UTF8.GetString($buffer)

UPDATE 2. To read last M lines, we'll be reading the file by portions until there are more than M newline char sequences in the result:

$M = 3
$fpath = "C:\10GBfile.dat"

$result = ""
$seq = "`r`n"
$buffer_size = 10
$buffer = new-object Byte[] $buffer_size

$fs = [IO.File]::OpenRead($fpath)
while (([regex]::Matches($result, $seq)).Count -lt $M)
{
    $fs.Seek(-($result.Length + $buffer_size), [System.IO.SeekOrigin]::End) | Out-Null
    $fs.Read($buffer, 0, $buffer_size) | Out-Null
    $result = [System.Text.Encoding]::UTF8.GetString($buffer) + $result
}
$fs.Close()

($result -split $seq) | Select -Last $M

Try playing with bigger $buffer_size - this ideally is equal to expected average line length to make fewer disk operations. Also pay attention to $seq - this could be \r\n or just \n. This is very dirty code without any error handling and optimizations.

Aziz Kabyshev
  • 790
  • 4
  • 8
5

When the file is already opened, it's better to use

Get-Content $fpath -tail 10

because of "exception calling "OpenRead" with "1" argument(s): "The process cannot access the file..."

Petr Spacek
  • 629
  • 6
  • 8
3

With the awesome answer by Aziz Kabyshev, which solves the issue of speed, and with some googling, I ended up using this script

$fpath = $Args[1]
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-$Args[0], 'End') | Out-Null
$mystr = ''
for ($i = 0; $i -lt $Args[0]; $i++)
{
    $mystr = ($mystr) + ([char[]]($fs.ReadByte()))
}
$fs.Close()
Write-Host $mystr

which I call from a batch file containing

@PowerShell -NoProfile -ExecutionPolicy Bypass -Command "& '.\myscript.ps1' %1 %2"

(thanks to How to run a PowerShell script from a batch file).

Community
  • 1
  • 1
3

This is not an answer, but a large comment as reply to sancho.s' answer.

When you want to use small PowerShell scripts from a Batch file, I suggest you to use the method below, that is simpler and allows to keep all the code in the same Batch file:

@PowerShell  ^
   $fpath = %2;  ^
   $fs = [IO.File]::OpenRead($fpath);  ^
   $fs.Seek(-%1, 'End') ^| Out-Null;  ^
   $mystr = '';  ^
   for ($i = 0; $i -lt %1; $i++)  ^
   {  ^
      $mystr = ($mystr) + ([char[]]($fs.ReadByte()));  ^
   }  ^
   Write-Host $mystr
%End PowerShell%
Aacini
  • 65,180
  • 12
  • 72
  • 108
  • 1
    This is very useful for me. A caveat: the way to execute this is with `myscript.bat nbytes 'myfile'`. Using the filename with single quotes is mandatory. No quotes or double quotes did not work, as opposed to executing a batch file that calls a ps1 script. – sancho.s ReinstateMonicaCellio Apr 11 '16 at 15:38
1

Get last n bytes of a file:

set file="C:\Covid.mp4"
set n=7

copy /b %file% tmp
for %i in (tmp) do set /a m=%~zi-%n%
FSUTIL file seteof tmp %m%
fsutil file createnew temp 1
FSUTIL file seteof temp %n%
type temp >> tmp
fc /b tmp %file% | more +1 > temp

REM problem parsing file with byte offsets in hex from fc, to be converted to decimal offsets before output
type nul > tmp
for /f "tokens=1-3 delims=: " %i in (temp) do set /a 0x%i >> tmp & set /p=": " <nul>> tmp & echo %j %k >> tmp

set /a n=%m%+%n%-1

REM output
type nul > temp
for /l %j in (%m%,1,%n%) do (find "%j: "<  tmp || echo doh: la 00)>> temp
(for /f "tokens=3" %i in (temp) do set /p=%i <nul) & del tmp & del temp

Tested on Win 10 cmd Surface Laptop 1
Result: 1.43 GB file processed in 10 seconds

Zimba
  • 2,854
  • 18
  • 26