2

I have a PowerShell script that pulls in 1.4+ million rows of data and saves it to a HUGE CSV file that then gets imported into an SQL server. I thought there might be a way to have PowerShell insert the data into the SQL server directly but I am not sure how.

One of my concerns is that I don't want to buffer up the AD result into memory and then write them. I'd rather write them in batches of 1000 or something so memory consumption stays down. Get 1000 records, save to SQL server, and repeat...

I see articles about how to get PowerShell to write to an SQL server but they all seem to either do ALL data at one time or one record at a time -- both of which seem inefficient to me.

This is the PowerShell script I have to query AD.

# the attributes we want to load
$ATTRIBUTES_TO_GET = "name,distinguishedName"

# split into an array
$attributes = $ATTRIBUTES_TO_GET.split(",")

# create a select string to be used when we want to dump the information
$selectAttributes = $attributes | ForEach-Object {@{n="AD $_";e=$ExecutionContext.InvokeCommand.NewScriptBlock("`$_.$($_.toLower())")}}

# get a directory searcher to search the GC
[System.DirectoryServices.DirectoryEntry] $objRoot = New-Object System.DirectoryServices.DirectoryEntry("GC://dc=company,dc=com")
[System.DirectoryServices.DirectorySearcher] $objSearcher = New-Object System.DirectoryServices.DirectorySearcher($objRoot)

# set properties
$objSearcher.SearchScope = "Subtree"
$objSearcher.ReferralChasing = "All"

# need to set page size otherwise AD won't return everything
$objSearcher.PageSize = 1000

# load the data we want
$objSearcher.PropertiesToLoad.AddRange($attributes)

# set the filter
$objSearcher.Filter = "(&(objectClass=group)(|(name=a*)(name=b*)))"

# get the data and export to csv
$objSearcher.FindAll() | select -expandproperty properties | select $selectAttributes | export-csv -notypeinformation -force "out.csv"
IMTheNachoMan
  • 5,343
  • 5
  • 40
  • 89
  • 2
    Consider using SqlBulkCopy in batches. http://stackoverflow.com/questions/43679921/importing-large-xml-file-into-sql-2-5gb – Dan Guzman May 12 '17 at 23:33
  • I would argue that a [bulk insert](http://stackoverflow.com/q/96448/1630171) is about as efficient as it gets. Why would you believe otherwise? – Ansgar Wiechers May 12 '17 at 23:34
  • @DanGuzman But how would I do it in batches with a `System.DirectoryServices.SearchResultCollection` object? The only way I can think is to iterate through the `SearchResultCollection` and do bulk SQL insert after ever X # of records? – IMTheNachoMan May 13 '17 at 02:41
  • @AnsgarWiechers I don't disagree. I just can't figure out how to do a bulk insert with a `SearchResultCollection` object without having to load the whole collection into memory. – IMTheNachoMan May 13 '17 at 02:41

4 Answers4

2

I use Out-DataTable to convert my object array into a DataTable object type, then use Write-DataTable to bulk insert that into a database (Write-DataTable uses SqlBulkCopy to do this).

Caveats/gotchas for this (SqlBulkCopy can be a nuisance to troubleshoot):

  • Make sure your properties are the correct type (string for varchar/nvarchar, int for any integer values, dateTime can be string as long as the format is correct and SQL can parse it)
  • Make sure you properties are in order and line up with the table you're inserting to, including any fields that auto fill (incrementing ID key, RunDt, etc).

Out-DataTable: https://gallery.technet.microsoft.com/scriptcenter/4208a159-a52e-4b99-83d4-8048468d29dd

Write-DataTable: https://gallery.technet.microsoft.com/scriptcenter/2fdeaf8d-b164-411c-9483-99413d6053ae

Usage

If I were to continue on your example and skip the CSV, this is how I would do it... replace the last two lines with the code below (assuming that your object properties line up with the table perfectly, your SQL server name is sql-server-1, database name is org, and table name is employees):

try {
    Write-DataTable -ServerInstance sql-server-1 -Database org -TableName employees -Data $($objSearcher.FindAll() | Select-Object -expandproperty properties | Select-Object $selectAttributes | Out-DataTable -ErrorAction Stop) -ErrorAction Stop
}
catch {
    $_
}
scrthq
  • 1,039
  • 11
  • 17
  • but won't that first line, `$set = ...` first load everything into memory? Isn't there a way to load only X # of rows into memory, write to SQL, and then load the next X # of rows? – IMTheNachoMan May 13 '17 at 02:44
  • if your query against AD is pulling everything, then it's already loading into memory. You would need to tighten your query down, otherwise it's an irrelevant concern. I've updated mine to a one liner wrapped in a try catch statement so it's immediately loaded into an array, converted to a datatable, then bulk inserted into the database. Considering your `$objSearcher.PageSize = 1000` bit is already limiting, maybe convert that so that it bulk inserts 1000 records at a time in a loop? – scrthq May 13 '17 at 02:53
  • I will say that I use the same method to load 75,000+ rows at a time into SQL, a lot of which are nvarchar(max) fields. It can handle a big load, I think you can save a lot of time doing larger chunks at once – scrthq May 13 '17 at 03:07
0

Looking at your code it looks like you come from .NET or some language based on .NET. Have you heard of the cmdlets Get-ADUser / Get-ADGroup? This would simplify things tremendously for you.

As far as the SQL connection goes PowerShell doesn't have any native support for it. Microsoft has made cmdlets for it though! You just have to have SQL Server Installed in order to get them.... Which is kinda a bummer since SQL is so heavy and not everyone wants to install it. It is still possible using .NET, it's just not very quick or pretty. I won't be giving advice on the cmdlets here, you can Google that. As far as .NET, I would start by reading some of the documentation on the System.Data.SqlClient namespace as well as some historical questions on the subject.

Finally, as you said it would be a good idea to try and avoid overloading your RAM. The big thing here is trying to keep your entire script down to one single AD query. This way you avoid the troubling scenario of data changing between one query and the next. I think the best way of doing this would be to save your results straight to a file. Once you have that you could use SqlBulkCopy to insert into the table straight from your file. The downside to this is that it doesn't allow for multiple AD Properties. At least I don't think SqlBulkCopy will allow for this?

Get-ADUser "SomeParamsHere" | Out-File ADOutput.txt

If you have to have multiple AD properties and still want to keep the RAM usage to a minimum...well I toyed around with a script that would work but makes a few calls that would read from the entire file, which defeats the whole purpose. Your best option might be to save each property to a separate file then do your whole write DB thing. Example:

New-Item Name.txt
New-Item DistinguishedName.txt

Get-ADUser "SomeParamsHere" -Properties "Name,DistinguishedName" | Foreach {
    Add-Content -Path "Name.txt" -Value "$_.Name"
    Add-Content -PassThru "DistinguishedName.txt" -Value "$_.DistinguishedName"
}
Community
  • 1
  • 1
  • I have heard of `Get-ADUser` but thought .NET was faster and more efficient. Is this not the case? And if I save to file first then its a multi-step process: get data, write to file, read from file and write to SQL. There must be a way to get data and write straight to SQL in batches? Like read 1000 rows of AD data, write to SQL, then read next 1000 rows... – IMTheNachoMan May 13 '17 at 02:46
  • I'm not sure which is more efficient. You could test to find out with the Measure-Command cmdlet. I'd be curious to find out. But you didn't say anything about CPU efficiency in your question so I wasn't focusing on that at all. To be realistic about it though you can't have RAM and CPU efficiency. They contradict each other. You have to choose one or the other. My answer prioritizes keeping the RAM usage down. – Michael Timmerman May 13 '17 at 03:11
0

Store results in your last line of code in a variable instead of exporting it to csv.
Then create group's of size you want't.
Using Out-DataTable and Write-DataTable write to SQL - links in nferrell's answer.

$res = $objSearcher.FindAll() | select -expandproperty properties | select 
$selectAttributes 
$counter = [pscustomobject] @{ Value = 0 }
#create groups with 1000 entries each 
$groups = $res | Group-Object -Property { [math]::Floor($counter.Value++ / 1000) }
foreach ($group in $groups){
    #convert to data table
    $dt = $group.group | Out-DataTable
    $dt | Write-DataTable -Database DB -ServerInstance SERVER -TableName TABLE 
}

`

Adam Mnich
  • 461
  • 3
  • 12
0

You're making this unneccesarily complicated. If I read your code correctly, you want all groups starting with 'a' or 'b'.

# the attributes we want to export
$attributes = 'name', 'distinguishedName'

Import-Module ActiveDirectory

Get-ADGroup -Filter {(name -like "a*") -or (name -like "b*")} -SearchBase 'dc=company,dc=com' |
    select $attributes | Export-Csv -NoTypeInformation -Force "out.csv"

Instead of using Export-Csv at the end, just pipe the output to the command which creates the SQL rows. By piping objects (instead of assigning them to a variable) you give PowerShell the ability to handle them efficiently (it will start processing objects as they come in, not buffer everything).

Unfortunately I can't help you with the SQL part.