1

I have a file that is too large for me to open. However, I only need the last portion of this file. There is a line of text that includes this string: DATA FROM NSERCH= 249

If I can pull everything from that line to the end of the document, I should be able to open the file.

Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328
  • Are you saying this works if the file is smaller, and you only need to solve the file size issue? – Bernard Vander Beken Feb 19 '19 at 14:57
  • Oh sorry, I don't know why I included that information in there. I literally just can't open the whole file in a text editor (it's over 1GB). But I know that the data that I need is in the last 10% of the file. So if I can produce a file that only contains that last 10% of text, I can get what I need. Might be worth noting that I'm a chemist and not very proficient with PowerShell, etc. – Chris Joseph Feb 19 '19 at 15:11
  • 1
    https://stackoverflow.com/questions/36507343/get-last-n-lines-or-bytes-of-a-huge-file-in-windows-like-unixs-tail-avoid-ti – Sanpas Feb 19 '19 at 15:34

2 Answers2

2

You can use the Get-Content cmdlet for this and read lines until it finds the string you choose as starting point for this:

$filename   = 'FULL PATH TO THE TOO LARGE TO OPEN FILE'
$outputPath = 'FULL PATH TO THE OUTPUT.TXT FILE'

$saveit = $false
Get-Content -Path $filename | ForEach-Object {
    # the $_ automatic variable represents a single line of the file
    if ($saveit) { 
        Add-Content -Path $outputPath -Value $_ 
    }
    else {
        $saveit = ($_ -match 'DATA\s+FROM\s+NSERCH=\s+249')
    }
}

The code below does the exact same thing, but requires .NET 4.0 or higher, so if you are using PowerShell 3.0 or up, you can use the [System.IO.File]::ReadLines() method to speed things up:

$filename   = 'FULL PATH TO THE TOO LARGE TO OPEN FILE'
$outputPath = 'FULL PATH TO THE OUTPUT.TXT FILE'

$saveit = $false
foreach ($line in [System.IO.File]::ReadLines($filename)) {
    if ($saveit) { 
        Add-Content -Path $outputPath -Value $line 
    }
    else {
        $saveit = ($line -match 'DATA\s+FROM\s+NSERCH=\s+249')
    }
}

Another Get-Content alternative could be:

$filename   = 'FULL PATH TO THE TOO LARGE TO OPEN FILE'
$outputPath = 'FULL PATH TO THE OUTPUT.TXT FILE'

$saveit = $false
$reader = [System.IO.File]::OpenText($filename)
while (!($reader.EndOfStream)) {
    $line = $reader.ReadLine()
    if ($saveit) { 
        Add-Content -Path $outputPath -Value $line 
    }
    else {
        $saveit = ($line -match 'DATA\s+FROM\s+NSERCH=\s+249')
    }
}
$reader.Close()
Theo
  • 57,719
  • 8
  • 24
  • 41
1

Update: this is not a direct answer but a workaround to only search part of the file: check the last number of lines only. This can be done repeatedly by increasing the number of lines if needed.

The Get-Content -Tail <number of lines> parameter described here specifies the number of lines from the end of a file or other item. You can use it to reduce the input. This parameter was introduced in PowerShell 3.0.

Bernard Vander Beken
  • 4,848
  • 5
  • 54
  • 76