3

I have a solution for this but I believe it is not the best method as it takes forever so I am looking for a faster/better/smarter way.

I have multiple pscustomObject objects pulled from .csv files. Each object has at least one common property. One is relatively small (around 200-300 items/lines in the object) but the other is sizable (around 60,000-100,000 items). The contents of one may or may not match the contents of the other.

I need to find where the two objects match on a specific property and then combine the properties of each object into one object with all or most properties.

An example snippet of the code (not exact but for this it should work - see the image for the sample data): DataTables

Write-Verbose "Pulling basic Fruit data together"
$Purchase = import-csv "C:\Purchase.csv"
$Selling = import-csv "C:\Selling.csv"

Write-Verbose "Combining Fruit names and removing duplicates"
$Fruits = $Purchase.Fruit
$Fruits += $Selling.Fruit
$Fruits = $Fruits | Sort-Object -Unique

$compareData = @()

Foreach ($Fruit in $Fruits) {
        $IndResults = @()
        $IndResults = [pscustomobject]@{
        #Adding Purchase and Selling data
        Farmer = $Purchase.Where({$PSItem.Fruit -eq $Fruit}).Farmer
        Region = $Purchase.Where({$PSItem.Fruit -eq $Fruit}).Region
        Water = $Purchase.Where({$PSItem.Fruit -eq $Fruit}).Water
        Market = $Selling.Where({$PSItem.Fruit -eq $Fruit}).Market
        Cost = $Selling.Where({$PSItem.Fruit -eq $Fruit}).Cost
        Tax = $Selling.Where({$PSItem.Fruit -eq $Fruit}).Tax
        }
    Write-Verbose "Loading Individual results into response"
    $CompareData += $IndResults
}

Write-Output $CompareData

I believe the issue is in lines like these:

Farmer = $Purchase.Where({$PSItem.Fruit -eq $Fruit}).Farmer

If I understand this it is looking through the $Purchase object each time it goes through this line. I am looking for a way to speed that whole process up instead of having it look through the entire object for each match attempt.

iRon
  • 20,463
  • 10
  • 53
  • 79
J.C.
  • 143
  • 1
  • 8
  • You could do the `.where()` once for Purchase and once for Selling. Store that in an variable then set the three properties from that variable. Something like: `$PurchaseItem = $Purchase.Where({$PSItem.Fruit -eq $Fruit})` then in the object `Farmer = $PurchaseItem.Farmer` – BenH Jan 25 '17 at 21:27
  • Right after I posted I had a similar idea of shortening the initial search object before the assignments with a line like this:$SearchData = $Purchase | Where ({$Fruits -contains $_.Fruit}) and then changing the assignment object in the foreach and that does seem to shorten it greatly. That sounds like the same concept. It isn't perfect but it makes a big difference. – J.C. Jan 25 '17 at 21:53
  • 1
    I would say that powershell is not that effective in such operations in general. So if that is a repetitive task with large data amounts - I would load these csv files to any database - sqllite, mssqlExpress etc. Then do the task using single line sql: select p.*, s.Market, s.Cost, s.Tax from Purchases as p inner join Selling as s on p.Fruit = s.Fruit – Anton Krouglov Jan 26 '17 at 07:32

4 Answers4

2

Using this Join-Object:

$Purchase | Join $Selling -On Fruit | Format-Table

Result (using Simon Catlin's data):

Fruit      Farmer  Region     Water Market  Cost Tax
-----      ------  ------     ----- ------  ---- ---
Apple      Adam    Alabama    1     MarketA 10   0.1
Cherry     Charlie Cincinnati 2     MarketC 20   0.2
Damson     Daniel  Derby      3     MarketD 30   0.3
Elderberry Emma    Eastbourne 4     MarketE 40   0.4
Fig        Freda   Florida    5     MarketF 50   0.5
iRon
  • 20,463
  • 10
  • 53
  • 79
1

I had this very problem when trying to consolidate employee data from our HR system against employee data in our AD forest. With many thousands of rows, the process was taking an age.

I eventually walked away from custom objects and reverted to old school hash tables.

The hash tables entries themselves then held a sub-hash table with the data. In your instance, the outer hash would be keyed on $fruit, with the sub-hash containing the various attributes, e.g.: farmer, region, Etc.

Hash tables are lightning quick in comparison. It's a shame that PowerShell is slow in this regard.

Shout if you need more info.

26/01 Example code... assuming I'm correctly understanding the requirement:


PURCHASE.CSV:

Fruit,Farmer,Region,Water
Apple,Adam,Alabama,1
Cherry,Charlie,Cincinnati,2
Damson,Daniel,Derby,3
Elderberry,Emma,Eastbourne,4 
Fig,Freda,Florida,5

SELLING.CSV

Fruit,Market,Cost,Tax
Apple,MarketA,10,0.1
Cherry,MarketC,20,0.2
Damson,MarketD,30,0.3
Elderberry,MarketE,40,0.4
Fig,MarketF,50,0.5

CODE

[String]       $Local:strPurchaseFile    = 'c:\temp\purchase.csv';
[String]       $Local:strSellingFile     = 'c:\temp\selling.csv';
[HashTable]    $Local:objFruitHash       = @{};
[System.Array] $Local:objSelectStringHit = $null;
[String]       $Local:strFruit           = '';

if ( (Test-Path -LiteralPath $strPurchaseFile -PathType Leaf) -and (Test-Path -LiteralPath $strSellingFile -PathType Leaf) ) {

    #
    # Populate data from purchase file.
    #
    foreach ( $objSelectStringHit in (Select-String -LiteralPath $strPurchaseFile -Pattern '^([^,]+),([^,]+),([^,]+),([^,]+)$' | Select-Object -Skip 1) ) {
        $objFruitHash[ $objSelectStringHit.Matches[0].Groups[1].Value ] = @{ 'Farmer' = $objSelectStringHit.Matches[0].Groups[2].Value;
                                                                             'Region' = $objSelectStringHit.Matches[0].Groups[3].Value;
                                                                             'Water'  = $objSelectStringHit.Matches[0].Groups[4].Value;
                                                                           };
        } #foreach-purchase-row

    #
    # Populate data from selling file.
    #
    foreach ( $objSelectStringHit in (Select-String -LiteralPath $strSellingFile -Pattern '^([^,]+),([^,]+),([^,]+),([^,]+)$' | Select-Object -Skip 1) ) {
        $objFruitHash[ $objSelectStringHit.Matches[0].Groups[1].Value ] += @{ 'Market' = $objSelectStringHit.Matches[0].Groups[2].Value;
                                                                              'Cost'   = [Convert]::ToDecimal( $objSelectStringHit.Matches[0].Groups[3].Value );
                                                                              'Tax'    = [Convert]::ToDecimal( $objSelectStringHit.Matches[0].Groups[4].Value );
                                                                            };
        } #foreach-selling-row

    #
    # Output data.  At this point, you could now build a PSCustomObject.
    #
    foreach ( $strFruit in ($objFruitHash.Keys | Sort-Object) ) {
        Write-Host -Object ( '{0,-15}{1,-15}{2,-15}{3,-10}{4,-10}{5,10:C}{6,10:P}' -f 
                                         $strFruit,
                                         $objFruitHash[$strFruit]['Farmer'],
                                         $objFruitHash[$strFruit]['Region'],
                                         $objFruitHash[$strFruit]['Water'],
                                         $objFruitHash[$strFruit]['Market'],
                                         $objFruitHash[$strFruit]['Cost'],
                                         $objFruitHash[$strFruit]['Tax']
                           );
        } #foreach

} else {
    Write-Error -Message 'File error.';
} #else-if
Simon Catlin
  • 2,141
  • 1
  • 13
  • 15
  • i was trying this solution, but i started drowning so switched to join-object. can you share a coded solution based on OP's sample data? – Anthony Stringer Jan 25 '17 at 21:35
  • I am not sure I understand how they could be used to put that together given that the data isn't a simple Name=Value. If a sub-hash goes into the Value, would that value look something like this ( `@{Oranges = @{Farmer=Bob,Region=West,Water=Ok,Market=High,Cost=1,Tax=Yes}}` – J.C. Jan 25 '17 at 23:03
1

using Join-Object

http://ramblingcookiemonster.github.io/Join-Object/

Join-Object -Left $purchase -Right $selling -LeftJoinProperty fruit -RightJoinProperty fruit -Type OnlyIfInBoth | ft
Anthony Stringer
  • 1,981
  • 1
  • 10
  • 15
0

I needed to do this myself for something similar. I wanted to take two system array objects and compare them pulling out the matches without having to manipulate the input data each time. Here's the method I used, which although I appreciate this is inefficient, it was instantaneous for the 200 or so records I had to work with.

I tried to translate what I was doing (users and their old and new home directories) into farmers, fruit and markets etc so I hope it makes sense!

$Purchase = import-csv "C:\Purchase.csv"
$Selling = import-csv "C:\Selling.csv"

$compareData = @()
foreach ($iPurch in $Purchase) {
    foreach ($iSell in $Selling) {
        if ($iPurch.fruit -match $iSell.fruit) {
            write-host "Match found between $($iPurch.Fruit) and $($iSell.Fruit)"
            $hash = @{
                Fruit           =   $iPurch.Fruit
                Farmer          =   $iPurch.Farmer
                Region          =   $iPurch.Region
                Water           =   $iPurch.Water
                Market          =   $iSell.Market
                Cost            =   $iSell.Cost
                Tax             =   $iSell.Tax
            }
            $Build = New-Object PSObject -Property $hash
            $Total = $Total + 1
            $compareData += $Build
            }
        }
    }
Write-Host "Processed $Total records"
iamkl00t
  • 189
  • 1
  • 5