0

I have a large text(.txt) file with several documents in it that need to be individual files.

There is a header of sorts at the start of each document that we can use to reference the start.

I would like to start the new file at this point, and name the file a number(incremental)

BONUS POINTS!: parse the file just broken and grab some text example: "Doc No. 1" to use as the file name.

I tried this as well as a few other suggestion with no luck.. https://forums.windowssecrets.com/showthread.php/174836-Powershell-Split-a-Text-File-Output-With-Delimiter-As-File-Name

  HEADER                                        EXAMPLE DATA
  HEADER                                        EXAMPLE DATA
  HEADER                                        EXAMPLE DATA

  ADDRESS CORRECTION REQUESTED                  Document No.         1

                                                period:
                                                DATE thru DATE

EXAMPLE DATA                    EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA


EXAMPLE DATA

          XXXXXXXXXXXX                             XXXX





  HEADER                                        EXAMPLE DATA
  HEADER                                        EXAMPLE DATA
  HEADER                                        EXAMPLE DATA

  ADDRESS CORRECTION REQUESTED                  Document No.         2

                                                period:
                                                DATE thru DATE

EXAMPLE DATA                    EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA


EXAMPLE DATA

          XXXXXXXXXXXX                             XXXX






  HEADER                                        EXAMPLE DATA
  HEADER                                        EXAMPLE DATA
  HEADER                                        EXAMPLE DATA

  ADDRESS CORRECTION REQUESTED                  Document No.         3

                                                period:
                                                DATE thru DATE

EXAMPLE DATA                    EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA


EXAMPLE DATA

          XXXXXXXXXXXX                             XXXX






  HEADER                                        EXAMPLE DATA
  HEADER                                        EXAMPLE DATA
  HEADER                                        EXAMPLE DATA

  ADDRESS CORRECTION REQUESTED                  Document No.         4

                                                period:
                                                DATE thru DATE

EXAMPLE DATA                    EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA
EXAMPLE DATA                        EXAMPLE DATA


EXAMPLE DATA

          XXXXXXXXXXXX                             XXXX
jdeuninck
  • 1
  • 2
  • Possible duplicate of [How can I split a text file using PowerShell?](https://stackoverflow.com/questions/1001776/how-can-i-split-a-text-file-using-powershell) – vonPryz Jan 31 '19 at 19:45
  • 1
    Welcome, jdeuninck! Try adding a small, representative chunk of your text file so we can see the structure. – Rich Moss Jan 31 '19 at 20:03
  • 1
    as others have pointed out, a sanitized and reasonably sized set of sample data is really a necessity for this. – Lee_Dailey Jan 31 '19 at 20:24
  • Done, thanks for the advice!! – jdeuninck Feb 01 '19 at 18:05
  • Difficult to derive a RegEx pattern from that, provided there are 4to 7 newlines and the indented by 2 spaces, `-split '(?S)(?<=\S\r?\n)(\r?\n){4,}(?=\r?\n )'` but this has some empty (just newlines) splits. How do you wat to name the files? –  Feb 03 '19 at 21:23
  • incrementally is fine. (ie file 1, file 2 and so on) – jdeuninck Feb 05 '19 at 00:06

1 Answers1

0

Given a file SplitText.txt in current folder:

> Get-Content .\SplitText.txt
xxx FirstFile zzz
FirstFile line 1
FirstFile line 2
FirstFile line 3
FirstFile line 4
FirstFile line 5
FirstFile line 6
xxx SecondFile zzz
SecondFile line A
SecondFile line B
SecondFile line C
SecondFile line D

This script will split it into numbered sections appended to the BaseName:

## Q:\Test\2019\01\31\SO_54467665.ps1
$File = Get-Item ".\SplitText.txt"
$i = 0
(Get-Content $File -raw) -split 'xxx .*? zzz\r?\n' -ne ''| ForEach-Object {
    $i++
    $_ | Set-Content -Path {"{0}\{1}_{2}{3}" -f `
         $File.DirectoryName, $File.BaseName, $i, $File.Extension}
}

> Get-Content .\SplitText_1.txt
FirstFile line 1
FirstFile line 2
FirstFile line 3
FirstFile line 4
FirstFile line 5
FirstFile line 6

> Get-Content .\SplitText_2.txt
SecondFile line A
SecondFile line B
SecondFile line C
SecondFile line D
  • do i change 'xxx .*? zzz\r?\n' with what im searching for? it also appears to remove the header and i need this to be maintained. – jdeuninck Jan 31 '19 at 22:38
  • You already got some comments to [edit](https://stackoverflow.com/posts/54467665/edit) your question to contain **your** sample file, I chose one from your links. The -split is Regular Expression based and ATM deletes this line. –  Jan 31 '19 at 23:09