4

Occasionally log (.txt) files are created which are too big to open (5GB+) which I need to create a solution to split into smaller readable chunks for use in wordpad. This is in Windows Server 2008 R2.

I need the solution to be a batch file, powerShell, or something similar. Ideally it should be hard coded that each text file contain no more than 999 MB and not stop in the middle of a line.

I found a solution similar to my needs which sometimes works (by line count) at https://gallery.technet.microsoft.com/scriptcenter/PowerShell-Split-large-log-6f2c4da0

############################################# 
# Split a log/text file into smaller chunks # 
############################################# 

# WARNING: This will take a long while with extremely large files and uses lots of memory to stage the file 

# Set the baseline counters  
# Set the line counter to 0  
$linecount = 0 

# Set the file counter to 1. This is used for the naming of the log files      
$filenumber = 1

# Prompt user for the path  
$sourcefilename = Read-Host "What is the full path and name of the log file to split? (e.g. D:\mylogfiles\mylog.txt)"   

# Prompt user for the destination folder to create the chunk files      
$destinationfolderpath = Read-Host "What is the path where you want to extract the content? (e.g. d:\yourpath\)"    
Write-Host "Please wait while the line count is calculated. This may take a while. No really, it could take a long time." 

# Find the current line count to present to the user before asking the new line count for chunk files  
Get-Content $sourcefilename | Measure-Object | ForEach-Object { $sourcelinecount = $_.Count }   

#Tell the user how large the current file is  
Write-Host "Your current file size is $sourcelinecount lines long"   

# Prompt user for the size of the new chunk files  
$destinationfilesize = Read-Host "How many lines will be in each new split file?"   

# the new size is a string, so we convert to integer and up 
# Set the upper boundary (maximum line count to write to each file)    
$maxsize = [int]$destinationfilesize     
Write-Host File is $sourcefilename - destination is $destinationfolderpath - new file line count will be $destinationfilesize 

# The process reads each line of the source file, writes it to the target log file and increments the line counter. When it reaches 100000 (approximately 50 MB of text data)  
$content = get-content $sourcefilename | % {
Add-Content $destinationfolderpath\splitlog$filenumber.txt "$_"    
$linecount ++   
If ($linecount -eq $maxsize) { 
    $filenumber++ 
    $linecount = 0    }  }   
# Clean up after your pet  
[gc]::collect()   
[gc]::WaitForPendingFinalizers 
()

However, when I have run this I get many errors in powershell similar to:

Add-Content : The process cannot access the file 'C:\Desktop\splitlog1.txt' 
because it is being used by another process...

so I am asking for help in fixing the above code, or help in creating a different/ better solution please.

splattne
  • 102,760
  • 52
  • 202
  • 249
JavaBeast
  • 766
  • 3
  • 11
  • 28
  • to avoid such huge log files you could be interested in [LogRotateWin](http://sourceforge.net/projects/logrotatewin/)... – aschipfl Sep 02 '15 at 19:48
  • @aschipfl I appreciate your suggestion, however this is not really going to help in my case. – JavaBeast Sep 02 '15 at 19:53
  • I use a script derived from the same article all the time without any issues. Based on the error you're seeing, it looks like maybe you've got your destination file open somewhere else. Are you running `Get-Content split-log1.txt -tail' in another shell? – E.Z. Hart Sep 03 '15 at 19:44

2 Answers2

5

Ok, I rose to the challenge. Here is the function that should work for you. It can split text file(s) by lines, put into each output file as many complete lines of input as possible without exceeding size bytes.

Note: output file size limit can't be strictly enforced.

Example: input files contains two very long strings, 1Mb each. If you try to split this file into the 512KB chunks, resulting files will be 1MB each.

Function Split-FileByLine:

<#
.Synopsis
    Split text file(s) by lines, put into each output file as many complete lines of input as possible without exceeding size bytes.

.Description
    Split text file(s) by lines, put into each output file as many complete lines of input as possible without exceeding size bytes.
    Note, that output file size limit can't be strictly enforced. Example: input files contains two very long strings, 1Mb each.
    If you try to split this file into the 512KB chunks, resulting files will be 1MB each.

    Splitted files will have orinignal file's name, followed by the "_part_" string and counter. Example:
    Original file: large.log
    Splitted files: large_part_0.log, large_part_1.log, large_part_2.log, etc.

.Parameter FileName
    Array of strings, mandatory. Filename(s) to split.

.Parameter OutPath
    String, mandatory. Folder, where splittedfiles will be stored. Will be created, if not exists.

.Parameter MaxFileSize
    Long, mandatory. Maximum output file size. When output file reaches this size, new file will be created.
    You can use PowerShell's multipliers: KB, MB, GB, TB,PB

.Parameter Encoding
    String. If not specified, script will use system's current ANSI code page to read the files.
    You can get other valid encodings for your system in PowerShell console like this:

    [System.Text.Encoding]::GetEncodings()

    Example:

    Unicode (UTF-7): utf-7
    Unicode (UTF-8): utf-8
    Western European (Windows): Windows-1252

.Example
    Split-FileByLine -FileName '.\large.log' -OutPath '.\splitted' -MaxFileSize 100MB -Verbose

    Split file "large.log" in current folder, write resulting files in subfolder "splitted", limit output file size to 100Mb, be verbose.

.Example
    Split-FileByLine -FileName '.\large.log' -OutPath '.\splitted' -MaxFileSize 100MB -Encoding 'utf-8'

    Split file "large.log" in current folder, write resulting files in subfolder "splitted", limit output file size to 100Mb, use UTF-8 encoding.

.Example
    Split-FileByLine -FileName '.\large_1.log', '.\large_2.log' -OutPath '.\splitted' -MaxFileSize 999MB

    Split files "large_1.log" ".\large_2.log" and  in current folder, write resulting files in subfolder "splitted", limit output file size to 999MB.

.Example
    '.\large_1.log', '.\large_2.log' | Split-FileByLine -FileName -OutPath '.\splitted' -MaxFileSize 999MB

    Split files "large_1.log" ".\large_2.log" and  in current folder, write resulting files in subfolder "splitted", limit output file size to 999MB.

#>
function Split-FileByLine
{
    [CmdletBinding()]
    Param
    (
        [Parameter(Mandatory = $true, ValueFromPipeline = $true, ValueFromPipelineByPropertyName = $true)]
        [string[]]$FileName,

        [Parameter(ValueFromPipelineByPropertyName = $true)]
        [string]$OutPath = (Get-Location -PSProvider FileSystem).Path,

        [Parameter(Mandatory = $true, ValueFromPipelineByPropertyName = $true)]
        [long]$MaxFileSize,

        [Parameter(ValueFromPipelineByPropertyName = $true)]
        [string]$Encoding = 'Default'
    )

    Begin
    {
        # Scriptblocks for common tasks
        $DisposeInFile = {
            Write-Verbose 'Disposing StreamReader'
            $InFile.Close()
            $InFile.Dispose()
        }

        $DisposeOutFile = {
            Write-Verbose 'Disposing StreamWriter'
            $OutFile.Flush()
            $OutFile.Close()
            $OutFile.Dispose()
        }

        $NewStreamWriter = {
            Write-Verbose 'Creating StreamWriter'
            $OutFileName = Join-Path -Path $OutPath -ChildPath (
                '{0}_part_{1}{2}' -f [System.IO.Path]::GetFileNameWithoutExtension($_), $Counter, [System.IO.Path]::GetExtension($_)
            )

            $OutFile = New-Object -TypeName System.IO.StreamWriter -ArgumentList (
                $OutFileName,
                $false,
                $FileEncoding
            ) -ErrorAction Stop
            $OutFile.AutoFlush = $true
            Write-Verbose "Writing new file: $OutFileName"
        }
    }

    Process
    {
        if($Encoding -eq 'Default')
        {
            # Set default encoding
            $FileEncoding = [System.Text.Encoding]::Default
        }
        else
        {
            # Try to set user-specified encoding
            try
            {
                $FileEncoding = [System.Text.Encoding]::GetEncoding($Encoding)
            }
            catch
            {
                throw "Not valid encoding: $Encoding"
            }
        }

        Write-Verbose "Input file: $FileName"
        Write-Verbose "Output folder: $OutPath"

        if(!(Test-Path -Path $OutPath -PathType Container)){
            Write-Verbose "Folder doesn't exist, creating: $OutPath"
            $null = New-Item -Path $OutPath -ItemType Directory -ErrorAction Stop
        }

        $FileName | ForEach-Object {
            # Open input file
            $InFile = New-Object -TypeName System.IO.StreamReader -ArgumentList (
                $_,
                $FileEncoding
            ) -ErrorAction Stop
            Write-Verbose "Current file: $_"

            $Counter = 0
            $OutFile = $null

            # Read lines from input file
            while(($line = $InFile.ReadLine()) -ne $null)
            {
                if($OutFile -eq $null)
                {
                    # No output file, create StreamWriter
                    . $NewStreamWriter
                }
                else
                {
                    if($OutFile.BaseStream.Length -ge $MaxFileSize)
                    {
                        # Output file reached size limit, closing
                        Write-Verbose "OutFile lenght: $($InFile.BaseStream.Length)"
                        . $DisposeOutFile
                        $Counter++
                        . $NewStreamWriter
                    }
                }

                # Write line to the output file
                $OutFile.WriteLine($line)
            }

            Write-Verbose "Finished processing file: $_"
            # Close open files and cleanup objects
            . $DisposeOutFile
            . $DisposeInFile
        }
    }
}

You can use it in your script like this:

function Split-FileByLine
{
    # function body here
}

$InputFile = 'c:\log\large.log'
$OutputDir = 'c:\log_split'

Split-FileByLine -FileName $InputFile -OutPath $OutputDir -MaxFileSize 999MB
beatcracker
  • 6,714
  • 1
  • 18
  • 41
  • Something seems off... I used it to split a file that was 983,336KB (at 200MB max each) and it gave be 4 files (204,801KB/204,801/204,801/164,136)... notice they don't add up to 983. Does this indicate loss of data somewhere? If I manually split a file the size does add up to the original. – JavaBeast Sep 04 '15 at 21:08
  • @JavaBeast Weird, I'll check this. – beatcracker Sep 04 '15 at 21:13
  • @JavaBeast Yep, bug in counter lead to the first split-file being overwritten. Check updated version. – beatcracker Sep 04 '15 at 21:58
  • I know I'm supposed to avoid comments like "Thanks!" but I cannot stop myself. THANK YOU! function works perfectly and is saving me a ton of time. @beatcracker wins the Internet for the day – Rocky Sep 01 '16 at 00:16
  • Very nice script but not really efficient for very large files. I had a 1G file I needed to split. I set the limit at 200MB. However, I aborted the script after it still had not completed the first file after 15 minutes. Way too long! Found another script which gave me what I needed in less than a minute: http://stackoverflow.com/questions/1001776/how-can-i-split-a-text-file-using-powershell – Chip Wood Oct 13 '16 at 14:47
  • @Woody I've just checked out of curiosity and it took ~9 seconds to split 1Gb log file in 200Mb chunks using this function. Weird... – beatcracker Oct 13 '16 at 15:29
  • @beatcracker Yes, very strange indeed! I wrapped the function in a Measure-Command cmdlet and it took 6 hours, 8 minutes, and 11 seconds. The other method took 46 seconds. Both methods ran on the same machine using the same file as input. They both produced 10 output files and both used 100MB as the split criteria. – Chip Wood Oct 18 '16 at 10:33
1

You could try split tool from CoreUtils for Windows with --line-bytes parameter:

--line-bytes=size

Put into each output file as many complete lines of input as possible without exceeding size bytes. Individual lines or records longer than size bytes are broken into multiple files. size has the same format as for the --bytes option. If --separator is specified, then lines determines the number of records

Example: split --line-bytes=999MB c:\logs\biglog.txt

beatcracker
  • 6,714
  • 1
  • 18
  • 41
  • Thanks, but I am unable to add or install any tools on the client work station. I need a solution in a 1 document script form that I can simply pass on to the user. – JavaBeast Sep 02 '15 at 21:41