1

I have a CSV containing 300K+ records. Using this as a base, I need to extract the value from Login column, feed it in get-aduser command and extract reporting manager and one other custom property for each user. the 2 additional values need to be appended to a copy of this large CSV

Unfortunately I have had no luck optimising

I tried the hashtable approach

  1. Since Login column of CSV has multiple rows containing same userid(each row corresponds to the different DB roles the user has), I extracted a list of unique userids. Out of 300K records, I got some 1700-1800 unique userids

  2. Then I created a hashtable

  $f1 = import-csv -path $CSVFilePath/$file -Delimiter '|'  -Header 
    Login,Name,Role,System,Permission
  $h2 = @{}
       $f1 | ForEach-Object {
         $h2[$_.Login] = $_
       }
  1. I tried updating the hashtable with the custom properties
$f1 |Foreach-object {
         $Login=$_.Login
         $Array | foreach-object{
         if($h2.ContainsKey( $_.userId)){
             $h2[$Login] | Add-Member -Name "REPORTING_MANAGER" -Value $_.Manager  -MemberType NoteProperty -Force       
             $h2[$Login]  | Add-Member -Name "CUSTOM_COLUMN" -Value $_.CustomProp-MemberType NoteProperty -Force   
      }
      }
      }

But the last loop is taking considerable time. Not sure how this can be made faster. Help !!

EDIT

Here are some details I forgot to include

$uniqueLogin = import-csv -path $CSVFilePath/$file -Delimiter '|'  | Sort Login -Unique | Select Login

$uniqueUsers = $uniqueLogin | Select-Object -Skip 1
$uniqueUsers.Count 
$Array = @() 
$uniqueUsers | % { 

$userid = $_."Login"

 $userProperties = get-aduser $userid -Server xxx -Properties * | select Manager,extensionAttribute7
 if($userProperties.Manager -ne $null){
 $manager = ($userProperties.Manager.Split(',')[0] + $userProperties.Manager.Split(',')[1]).Replace('\', ',').Replace('CN=', '')
 }
 if($userProperties.extensionAttribute7 -ne $null){
 $customprop=$userProperties.extensionAttribute7
 }
 $ResultObject = [PSCustomObject] @{ #Return Object
 userId = $userid
Manager = $manager
Customprop= $customprop
} 
$Array += $ResultObject 
 }

so basically $Array is an object of this kind

[
    {
        "userId":  "xxxx",
        "Manager":  "yyyy",
        "CustomProp":  "test1"
    },
    {
        "userId":  "yyyyyy",
        "Manager":  "zzzzz",
        "Customprop":  "test2"
    }
]

SAMPLE SOURCE CSV

enter image description here

Need EDITED CSV LIKE THIS enter image description here

  • @ZachYoung do you mean dumping $Array | Export-CSV?? Actually the problem is f1 CSV contains 3 lakh records where one userid may correspond to a 100 rows, the end goal is to have a copy of f1 csv – Just Another Developer Mar 17 '23 at 07:36
  • When you use a hash able, it should normally should avoid an embedded loop: see: [Use a HashTable to lookup values or objects](https://github.com/iRon7/PowerShell-Docs/blob/iRon7---HashTable-Performance/reference/docs-conceptual/dev-cross-plat/performance/script-authoring-considerations.md#use-a-hashtable-to-lookup-values-or-objects). – iRon Mar 17 '23 at 08:08
  • Anyways, if you do not want to go to the hassle of reinventing the wheel of joining two lists (and the pitfalls and performance considerations), you might use this [`Join-Object script`](https://www.powershellgallery.com/packages/Join)/[`Join-Object Module`](https://www.powershellgallery.com/packages/JoinModule) (see also: [In Powershell, what's the best way to join two tables into one?](https://stackoverflow.com/a/45483110/1701026)): `import-csv -path $CSVFilePath/$file -Delimiter '|' -Header Login,Name,Role,System,Permission |Join $Array -on Login -eq userId |Export-Csv .\Output.Csv` – iRon Mar 17 '23 at 08:09
  • 1
    Hi. I retracted my questions 'cause I realized I'm out of my depth regarding Power Shell and Windows admin. Still... I was asking if you could provide a **small** sample of the CSVs, so that someone like me who doesn't know the problem domain might begin to understand the inputs and outputs... the process. – Zach Young Mar 17 '23 at 08:12
  • Sure @ZachYoung no worries, will provide a CSV shortly – Just Another Developer Mar 17 '23 at 08:17
  • @iRon will give this a try and come back..Thanks! – Just Another Developer Mar 17 '23 at 08:18
  • How many user ids to you have? If large than you should create a hash table of user ids. Right not you algorithm is taking with M user ids and N rows in CSV MN/2. With hash table that will drop to log(M)N. – jdweng Mar 17 '23 at 09:42
  • @jdweng I have 1700-1800 unique user ids but my CSV does not have unique values in the Login column, I basically have an array of unique user ids, now my challenge is against these i have to append 2 custom columns in the source CSV which has 3000k records – Just Another Developer Mar 17 '23 at 10:09
  • What is `$Array` exactly? Please post sample CSV and provide a complete snippet of code :) – Mathias R. Jessen Mar 17 '23 at 10:25
  • @MathiasR.Jessen Thanks for this question, realized I missed some details..Added in EDIT – Just Another Developer Mar 17 '23 at 10:38
  • Lets say you have 2048 ids which is 2^11. Right now your time is 2048/2 (1024) for finding id. Using a has table time would be 11. so your saving would be 11/1024 for run time which reduce runtime by ~99%. – jdweng Mar 17 '23 at 10:40
  • @jdweng please advise on how to achieve lesser time – Just Another Developer Mar 17 '23 at 10:45
  • Added sample CSV as well, teh source as well as the output needed – Just Another Developer Mar 17 '23 at 10:46

2 Answers2

2

You'll want to use the same trick for the manager lookup - turn the array into a dictionary, at which point you can simply look up the corresponding manager and custom properties:

$lookupTable = @{}
Import-Csv -Path $CSVFilePath/$file -Delimiter '|' -Header Login,Name,Role,System,Permission |Sort-Object Login -Unique |Select-Object -Skip 1 -ExpandProperty Login |ForEach-Object {
  $userProperties = Get-ADUser $_ -Server xxx -Properties manager,extensionAttribute7

  $manager = if ($userProperties.manager) {
    ($userProperties.manager -split "(?<!\\),")[0].Split('=')[1].Replace('\', '')
  } else {
    '<No manager found>'
  }

  $lookupTable[$_] = [pscustomobject]@{
    manager = $manager
    customProp = $userProperties.extensionAttribute7
  }
}

Now it's as simple as piping the csv data to Select-Object:

Import-Csv -Path $CSVFilePath/$file -Delimiter '|' -Header Login,Name,Role,System,Permission |Select-Object *,@{Name='Manager';Expression={$lookupTable[$_.Login].manager}},@{Name='CustomProp';Expression={$lookupTable[$_.Login].customProp}} |Export-Csv $CSVFilePath/output.csv -NoTypeInformation
Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
  • I have a query, in last command where we are using the lookup table, how is the mapping between Login of Source CSV and lookup table achieved? – Just Another Developer Mar 17 '23 at 11:31
  • 1
    @JustAnotherDeveloper we're using the same CSV file as input in both cases so the key value is the same - no need for additional mappings :) – Mathias R. Jessen Mar 17 '23 at 11:32
  • The is no need to build a hash table from the CSV file. You just need to use the login hash table that already exists. See my solution. – jdweng Mar 17 '23 at 11:53
  • 1
    @MathiasR.Jessen this works! I tried on my real data ! Thanks a ton mate! – Just Another Developer Mar 17 '23 at 12:02
  • @jdweng perhaps I'm misunderstanding something, but it sounds like OP wants to amend _all 300K records_ - so you'll need to either do 2 passes over the CSV (like in my example), or you'll have to dynamically populate the hashtable as you go along – Mathias R. Jessen Mar 17 '23 at 12:03
  • Look at my solution. The OP build a hash table of logins from the JSON inputs which is H2. So all you need to do is to enumerate through the csv file, look up the login in the h2 table, and then add two new columns to the csv table. The OP had two for loops instead of just looking up value in H2. You build a second hash table of the CSV which is not needed. You basically did the opposite of what I did by enumerating through the logins and then looking up the csv hash table. Your code will take longer when all the login are not used. – jdweng Mar 17 '23 at 12:16
  • @jdweng What JSON input? The sample data in OP's post is derived from the CSV – Mathias R. Jessen Mar 17 '23 at 12:20
  • The login data is JSON. See code above with variable $ARRAY. – jdweng Mar 17 '23 at 13:02
  • This is not a good solution. Lets assume each logon occurs 128 times (~262144/ 2048). The log of When M is number of logins (2048 = 2^11) and N is size CSV (262,144 = 2^18) you get Log(M) * N << M * log(N). for example 11 * 262,144 << 128 * 2048 * 18 which is 2,883,584 << 4,718,592. – jdweng Mar 17 '23 at 13:38
  • @jdweng What in the world are you talking about? This is not O(log(m)*n), hashtable lookups are expected to return in O(1). At which point do I process each record log(m) times? – Mathias R. Jessen Mar 17 '23 at 14:46
0

You built the hash table and then didn't use it. Should be :

$f1 |Foreach-object {
    $Login=$h2[$_.Login];
    $_ | Add-Member -Name "REPORTING_MANAGER" -Value $Login.Manager  -MemberType NoteProperty -Force       
    $_ | Add-Member -Name "CUSTOM_COLUMN" -Value $Login.CustomProp -MemberType NoteProperty -Force   

 }
jdweng
  • 33,250
  • 2
  • 15
  • 20