Monthly I receive several very large (~ 4 GB) fixed column width text file that needs to be imported into MS SQL Server. To import the file, the file must be converted into a text file with tab-delimited column values with spaces trimmed from each column value (some columns have no spaces). I'd like to use PowerShell to solve this and I'd like the code to be very, very fast.
I tried many iterations of code but so far too slow or not working. I've tried the Microsoft Text Parser (too slow). I've tried regex matching. I'm working on a Windows 7 machine with PowerShell 5.1 installed.
ID FIRST_NAME LAST_NAME COLUMN_NM_TOO_LON5THCOLUMN
10000000001MINNIE MOUSE COLUMN VALUE LONGSTARTS
$infile = "C:\Testing\IN_AND_OUT_FILES\srctst.txt"
$outfile = "C:\Testing\IN_AND_OUT_FILES\outtst.txt"
$batch = 1
[regex]$match_regex = '^(.{10})(.{50})(.{50})(.{50})(.{50})(.{3})(.{8})(.{4})(.{50})(.{2})(.{30})(.{6})(.{3})(.{4})(.{25})(.{2})(.{10})(.{3})(.{8})(.{4})(.{50})(.{2})(.{30})(.{6})(.{3})(.{2})(.{25})(.{2})(.{10})(.{3})(.{10})(.{10})(.{10})(.{2})(.{10})(.{50})(.{50})(.{50})(.{50})(.{8})(.{4})(.{50})(.{2})(.{30})(.{6})(.{3})(.{2})(.{25})(.{2})(.{10})(.{3})(.{4})(.{2})(.{4})(.{10})(.{38})(.{38})(.{15})(.{1})(.{10})(.{2})(.{10})(.{10})(.{10})(.{10})(.{38})(.{38})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})(.{10})$'
[regex]$replace_regex = "`${1}`t`${2}`t`${3}`t`${4}`t`${5}`t`${6}`t`${7}`t`${8}`t`${9}`t`${10}`t`${11}`t`${12}`t`${13}`t`${14}`t`${15}`t`${16}`t`${17}`t`${18}`t`${19}`t`${20}`t`${21}`t`${22}`t`${23}`t`${24}`t`${25}`t`${26}`t`${27}`t`${28}`t`${29}`t`${30}`t`${31}`t`${32}`t`${33}"
Get-Content $infile -ReadCount $batch |
foreach {
$_ -replace $match_regex, $replace_regex | Out-File $outfile -Append
}
Any help you can give is appreciated!