5

I wrote this PowerShell script and it is working fine, but it takes for example 55 seconds for it to run on one file, but i need to run it for 435 text files where each one is ~650 kB of text.

To complete the 435 files it takes over 7 hours! Is there any way to make this process faster? Maybe cut the process in half? I do not know if the problem is the loop.

$Path = "C:\Users\rfp6fkj\Desktop\Group\*"
#OK so the Get-ChildItem cmdlet in Powershell creates an array to enumerate/loop thru
$Files = Get-ChildItem "$Path.Group"
(Get-Content $Path) -notmatch '^#' | Where { $_.Trim(" `t") } | Set-Content $Path

function GoParseandInsert {
    try {
        #$connection.Open()
        $cmd = $connection.CreateCommand()
        $insert_stmt = "INSERT INTO PasswordAge.[dbo].[tblDataParsed]
            ([Server]
            ,[Group]
            ,[UserID])

        SELECT I.Server,[Group],F.Val AS TheUserID
        FROM  PasswordAge.[dbo].[GroupPasswordAge] I
              CROSS APPLY PasswordAge.[dbo].ParseDelimValues(CASE WHEN I.UserID = '' THEN 'No Data in Column' ELSE I.UserID END,',') F
        WHERE I.UserID <> ''" 

        $cmd.CommandText = $insert_stmt
        #Write-Output $insert_stmt
        $cmd.ExecuteNonQuery()
    } finally {
        if ($connection -and ($connection.State -eq 'Open')) {
            $connection.Close()
        }
    }
} 

try {
    foreach ($File in $Files) {
        $TheFileName = $File.Basename 
        $StringContent = Get-Content $File 

        foreach ($Thing in $StringContent) {
            $index = $Thing.IndexOf(":")
            $GroupName = $Thing.Substring(0, $index)
            $cmd = $Connection.CreateCommand()
            $BetterThing = $Thing.Split(":")[3]

            $insert_stmt = "INSERT INTO [dbo].[GroupPasswordAge]
                ([Server] 
                ,[Group] 
                ,[UserID])
            VALUES
                ('$thefileName','$GroupName','$BetterThing')" -replace "\s+"," "

            $cmd.CommandText = $insert_stmt
            Write-Host $insert_stmt -ForegroundColor DarkCyan
            $cmd.ExecuteNonQuery()
        }
    } 
} catch {
    #if Files dont exist throw a flag
    Write-Host "Caught an exception:" -ForegroundColor Red
    Write-Host "Exception Type: $($_.Exception.GetType().FullName)" -ForegroundColor DarkRed
    Write-Host "Exception Message: $($_.Exception.Message)" -ForegroundColor Red
} finally {
    Invoke-Sqlcmd -Query "Truncate table PasswordAge..tblDataParsed"
    GoParseandInsert
}

if ($connection -and ($connection.state -eq 'Open')) {
    $connection.Close()
}
nobody
  • 19,814
  • 17
  • 56
  • 77
Scrat50
  • 51
  • 2
  • 2
    How good is your SQL? I would suggest building larger statements instead of executing individual inserts. Might be able to work in batches of 10 for example. That should be faster. You also don't use the pipeline efficiently. You don't need to keep saving data in memory. It might look cleaner but there would be a small loss there for sure. Consider taken this to [codereview.se] since the answer could be too broad for here. You don't have to though. – Matt Aug 22 '17 at 13:50
  • 1
    Recreating the $cmd for each line is probably not required. I would move that out of the loop. Would be curious to see some sample lines of text for this as well. – Matt Aug 22 '17 at 14:00
  • I had a similar task before (Although I used Perl as I don't know about Powershell) anyway I found out is you use transactions to make the process faster; I was using transactions & committing every 500 query I was able to reduce the process by %70 ... It didn't require a powerful computer at the time it was a normal Core2Duo with 4G of RAM so I guess any modern computer should do better these days. – ahmad Aug 22 '17 at 14:06
  • A while ago, I have created this [`Log-Entry` framework](https://stackoverflow.com/questions/7834656/create-log-file-in-powershell) to might help you to exactly figure out which part is slow. Basic use: replace everything *in* the `Main {...}` function with your script. Replace all `Write-Host` commands with the `Log` Command (and possibly add some more `Log`s in your script. Check the time stamps in the `%Temp%\.log` file. – iRon Aug 22 '17 at 14:16
  • Have you tried adding timers to this to see where the time is being spent? That would be my first suggestion to diagnose what's going on. Maybe reduce the file count so that things finish inside five minutes (maybe use 4-5 files instead of 435) and see how much time is spent on (a) GetChildItem (b) GoParseAndInsert / First INSERT (c) Outer ForEach (d) Inner ForEach (e) Second INSERT. You might also count how many times (e) happens. That information will tell you where it makes sense to optimize. – Patrick87 Aug 22 '17 at 14:18
  • Here is an example on how a few lines of the text files look like. root:x:0:root bin:x:1:root,bin,daemon daemon:x:2:root,bin,daemon sys:x:3:root,bin,adm adm:x:4:root,adm,daemon tty:x:5: disk:x:6:root lp:x:7:daemon,lp mem:x:8: kmem:x:9: wheel:x:10:root mail:x:12:mail news:x:13:news uucp:x:14:uucp – Scrat50 Aug 22 '17 at 15:13
  • each one has about 33028 lines of this – Scrat50 Aug 22 '17 at 15:16
  • Ran the measure command for the whole script and it takes 47 seconds to complete from start to end. For one file – Scrat50 Aug 22 '17 at 15:23
  • 1
    You might try using `System.Diagnostics.StopWatch`, wrapping interesting parts in `.Start()` and `.Stop()` and adding up `.Elapsed.TotalMilliseconds` to see which parts of the code are taking the longest. The way these things often go is that you find 99% of the code takes 1% of the time, and 1% of the code takes 99% of the time - at least, when there are useful optimizations to be made. – Patrick87 Aug 22 '17 at 15:55
  • I am not usually one to just point people at third party options, but you may want to consider [CleanCode](https://sourceforge.net/projects/cleancode/) if for no other reason than to use their [Write-DataTable](http://cleancode.sourceforge.net/api/powershell/CleanCode/SqlTools/Write-DataTable.html) command. – TheMadTechnician Aug 22 '17 at 21:40
  • Better yet, [here](https://gallery.technet.microsoft.com/scriptcenter/Import-Large-CSVs-into-SQL-216223d9) is a page explaining how to perform bulk updates to SQL tables without the need for third party dependencies! I would recommend this as a better option, but am leaving my last comment as a lazy man's way out (hey, we're all lazy sometimes). – TheMadTechnician Aug 22 '17 at 21:43
  • Did you find a good solution to your question. Then please consider sharing it as an answer and marking it as solved. – Dennis May 31 '23 at 21:19

1 Answers1

-1

Maybe also try a multi threaded approach. A "thread" per text file?

https://blogs.technet.microsoft.com/uktechnet/2016/06/20/parallel-processing-with-powershell/