1

i Have 2 CSV's

left.csv

Ref_ID,First_Name,Last_Name,DOB
321364060,User1,Micah,11/01/1969
946497594,User2,Acker,05/28/1960
887327716,User3,Aco,06/26/1950
588496260,User4,John,05/23/1960
565465465,User5,Jack,07/08/2020

right.csv

First_Name,Last_Name,DOB,City,Document_Type,Filename
User1,Micah,11/01/1969,Parker,Transcript,T4IJZSYO.pdf
User2,Acker,05/28/1960,,Transcript,R4IKTRYN.pdf
User3,Aco,06/26/1950,,Transcript,R4IKTHMK.pdf
User4,John,05/23/1960,,Letter,R4IKTHSL.pdf

End Results:

Combined.csv

Ref_ID,First_Name,Last_Name,DOB,Document_Type,Filename
321364060,User1,Micah,11/01/1969,Parker,Transcript,T4IJZSYO.pdf
946497594,User2,Acker,05/28/1960,Transcript,R4IKTRYN.pdf
887327716,User3,Aco,06/26/1950,Transcript,R4IKTHMK.pdf
588496260,User4,John,05/23/1960,Letter,R4IKTHSL.pdf

I need to match them on First_Name,Last_Name,DOB then return Ref_ID, first_name, last_name, DOB from the left.csv and Document_Type,Filename from the right.csv

Use Compare-Object: that only returns columns from one of the csvs, not columns from both.

Use join-object: This was my great hope, but that only lets me match on one Column, I need to match multiple Columns (cant figure out how to do Multiple)

Im not sure where to go from here, open to suggestions.

iRon
  • 20,463
  • 10
  • 53
  • 79
moore1emu
  • 476
  • 8
  • 27
  • Does this answer your question? [In PowerShell, what's the best way to join two tables into one?](https://stackoverflow.com/questions/1848821/in-powershell-whats-the-best-way-to-join-two-tables-into-one) – TylerH May 23 '22 at 13:17

5 Answers5

2
$left = Import-Csv C:\left.csv
$right = Import-Csv C:\right.csv

Compare-Object -ReferenceObject $left -DifferenceObject $right -Property First_Name,Last_Name,DOB -IncludeEqual -ExcludeDifferent | 
    ForEach-Object {
        $iItem = $_
        $ileft = $left.Where({$_.First_Name -eq $iItem.First_Name -and $_.Last_Name -eq $iItem.Last_Name -and$_.DOB -eq $iItem.DOB})
        $iright = $right.Where({$_.First_Name -eq $iItem.First_Name -and $_.Last_Name -eq $iItem.Last_Name -and$_.DOB -eq $iItem.DOB})
        [pscustomobject]@{
            Ref_ID=$ileft.Ref_ID
            first_name=$ileft.first_name
            last_name=$ileft.last_name
            DOB=$ileft.DOB
            Document_Type=$iright.Document_Type
            Filename=$iright.Filename
        }
    } | Export-Csv C:\Combined.csv -NoTypeInformation
Nas
  • 1,243
  • 6
  • 7
  • Nicely done, just a couple things. Use `-Passthru` to just pass the matched objects down the pipeline to the `ForEach` loop. Then within the loop use `Compare-Object` again the other way to get the right side object, and simply add the missing properties to the left side object that you passed down. Then `Select * -Exclude SideIndicator` before you export to CSV. Lots of shortening to fit, but this works **Next Comment For Code** – TheMadTechnician Sep 08 '18 at 01:38
  • 1
    `Compare $left $right -Prop First_Name,Last_Name,DOB -Incl -Excl -PassThru|%{$iRight = Compare $right $_ -Property First_Name,Last_Name,DOB -Incl -Excl -PassThru;$_|Add-Member 'Document_Type' $iRight.Document_Type;$_|Add-Member 'FileName' $iRight.FileName -PassThru}|Select * -Exclude SideIndicator|Export-Csv C:\Combined.csv -NoType` – TheMadTechnician Sep 08 '18 at 01:39
  • @TheMadTechnician could you put your code in an Answer? I cant quite make out all the changes you made through the comments. – moore1emu Sep 08 '18 at 01:59
  • @Nas this looks awesome, easy to read, and pretty adaptable to future changes i may need. Ill try it out tomorrow and if it works, ill mark it as the accepted answer – moore1emu Sep 08 '18 at 02:06
  • Gave you the correct answer, This is easy to read, easy to adapt, and it supports mutliple columns and even if the headers don't match. Awesome work – moore1emu Sep 10 '18 at 15:10
  • Had to add AliasProperty to do non matching headers for the Compare-Object part of the query – moore1emu Sep 11 '18 at 12:54
  • @Nas how would i do this if Left has 1 Row with First, Last, DOB and Right has 3 Rows with the same First, Last, DOB, but different File Names? – moore1emu Sep 11 '18 at 15:44
  • You want to print those on one line and combine the file names or print three different lines? – Nas Sep 11 '18 at 22:13
  • In the first case add `-join ','` after every line in the pscustomobject hashtable where you expect multiple values. In the latter add this `$iright | ForEach-Object { }` to enclose `[pscustomobject]@{...}` and replace `$iright` with `$_` in `[pscustomobject]@{...}` – Nas Sep 11 '18 at 22:28
1

You could create you own key from each csv, then add from each csv to a new hashtable using this key.

Step through this in a debugger (ISE or VSCode) and tailor it to what you need... Add appropriate error checking as you need depending on the sanity of your data. Some statements below are just for debugging so you can inspect what's happening as it runs.

# Ref_ID,First_Name,Last_Name,DOB
$csv1 = @'
321364060,User1,Micah,11/01/1969
946497594,User2,Acker,05/28/1960
887327716,User3,Aco,06/26/1950
588496260,User4,John,05/23/1960
565465465,User5,Jack,07/08/2020
'@

# First_Name,Last_Name,DOB,City,Document_Type,Filename
$csv2 = @'
User1,Micah,11/01/1969,Parker,Transcript,T4IJZSYO.pdf
User2,Acker,05/28/1960,,Transcript,R4IKTRYN.pdf
User3,Aco,06/26/1950,,Transcript,R4IKTHMK.pdf
User4,John,05/23/1960,,Letter,R4IKTHSL.pdf
'@

# hashtable
$data = @{}

$c1 = $csv1 -split "`r`n"
$c1.count

foreach ($item in $c1)
{
    $fields = $item -split ','
    $key = $fields[1]+$fields[2]+$fields[3]
    $key

    # add new hashtable for given key
    $data.Add($key, [ordered]@{})

    # add data from c1 to the hashtable
    $data[$key].ID = $fields[0]
    $data[$key].First = $fields[1]
    $data[$key].Last = $fields[2]
    $data[$key].DOB = $fields[3]
}

$c2 = $csv2 -split "`r`n"
$c2.count

foreach ($item in $c2)
{
    $fields = $item -split ','
    $key = $fields[0]+$fields[1]+$fields[2]
    $key

    # add data from c2 to the hashtable
    $data[$key].Type = $fields[4]
    $data[$key].FileName = $fields[5]
}

$data.Count

foreach ($key in $data.Keys)
{
    '====================='
    $data[$key]
}
Kory Gill
  • 6,993
  • 1
  • 25
  • 33
1

Try this Join-Object.
It has a few more features along with joining based on multiple columns:

$Left = ConvertFrom-Csv @"
Ref_ID,First_Name,Last_Name,DOB
321364060,User1,Micah,11/01/1969
946497594,User2,Acker,05/28/1960
887327716,User3,Aco,06/26/1950
588496260,User4,John,05/23/1960
565465465,User5,Jack,07/08/2020
"@

$Right = ConvertFrom-Csv @"
First_Name,Last_Name,DOB,City,Document_Type,Filename
User1,Micah,11/01/1969,Parker,Transcript,T4IJZSYO.pdf
User2,Acker,05/28/1960,,Transcript,R4IKTRYN.pdf
User3,Aco,06/26/1950,,Transcript,R4IKTHMK.pdf
User4,John,05/23/1960,,Letter,R4IKTHSL.pdf
"@

$Left | Join $Right `
    -On First_Name, Last_Name, DOB `
    -Property Ref_ID, Filename, First_Name, DOB, Last_Name `
    | Format-Table

Last_Name    Ref_ID DOB                    Filename     First_Name
---------    ------ ---                    --------     ----------
Micah     321364060 1969-11-01 12:00:00 AM T4IJZSYO.pdf User1
Acker     946497594 1960-05-28 12:00:00 AM R4IKTRYN.pdf User2
Aco       887327716 1950-06-26 12:00:00 AM R4IKTHMK.pdf User3
John      588496260 1960-05-23 12:00:00 AM R4IKTHSL.pdf User4
iRon
  • 20,463
  • 10
  • 53
  • 79
  • You are a Gentlemen and a Scholar...and the Owner of that module, thank you so much. – moore1emu Sep 08 '18 at 15:54
  • what if the Column Headers names dont match? Like one is "First" and the other is "First_Name". In sql i would just do L.First = R.First_Name – moore1emu Sep 10 '18 at 13:09
  • I imported the Script and i am trying it out doing this command: $L = Import-Csv 'c:\Left.csv' $R = import-Csv 'c:\Right.csv' $L | Join $R First_Name, Last_Name, DOB | Select-Object Ref_ID, First_Name, Last_Name, DOB, Document_Type, Filename but i am not getting any results. I made sure the headers match and the file formats are identical, but still no results. Just using a simple Compare-object i do get several results. – moore1emu Sep 10 '18 at 13:24
  • For the first question: for single columns matches you can use the `-Equal` parameter, e.g.: `$Left | Join $Right First -Equals First_Name`. For multiple column matches where the names do not match, you might use an expression: `$Left | Join $Right {$Left.First -eq $Right.First_Name -and $Left.Last_Name -eq $Right.Last_Name -and $Left.DOB -eq $Right.DOB}`. The second (`-On` alias `-Using`) parameter is similar to the SQL parameter where it will act like the SQL `Using` clause if you supply an expression. *Note that using expressions is quiet expensive and might impact the performance.* – iRon Sep 10 '18 at 13:53
  • For the second question, I have changed my answer to reflect you specific table names and headers, excluding the contents of your files. (If you can find the difference, I suggest that you do the same and change your question reflecting your test data using the [`ConvertFrom-Csv`](https://learn.microsoft.com/powershell/module/microsoft.powershell.utility/convertfrom-csv?view=powershell-6) cmdlet) – iRon Sep 10 '18 at 14:19
  • I tried copying your query with the two arrays and pasting it into ISE and i still get 0 results. I installed the object-join using: install-script join and i had it set the enivronment path, but is there something else i need to do invoke the script? I just get blank results everytime. I think it might be still trying to use the join-object module i had installed. I uninstalled it, but still no results or error messsages – moore1emu Sep 10 '18 at 14:43
  • The easiest way is [`Dot Sourcing`](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_scripts?view=powershell-6#script-scope-and-dot-sourcing) the script, e.g.: `. .\Join-Object.ps1` – iRon Sep 10 '18 at 14:58
  • I have changed the `Join-Object` code a little (reflected in the answer) to support a list (`[String[]]`) of property names for the `-Property` parameter (previous versions only accept a hashtable for the `-Property` parameter in the form of: `@{ = { – iRon Sep 11 '18 at 14:24
1

Some good answers already, and here's another.

Import your myriad objects into a single (dis)array:

$left = @"
Ref_ID,First_Name,Last_Name,DOB
321364060,User1,Micah,11/01/1969
946497594,User2,Acker,05/28/1960
887327716,User3,Aco,06/26/1950
588496260,User4,John,05/23/1960
565465465,User5,Jack,07/08/2020
"@

$right = @"
First_Name,Last_Name,DOB,City,Document_Type,Filename
User1,Micah,11/01/1969,Parker,Transcript,T4IJZSYO.pdf
User2,Acker,05/28/1960,,Transcript,R4IKTRYN.pdf
User3,Aco,06/26/1950,,Transcript,R4IKTHMK.pdf
User4,John,05/23/1960,,Letter,R4IKTHSL.pdf
"@

$disarray = @(
    $left | ConvertFrom-Csv 
    $right | ConvertFrom-Csv
)

Use Group-Object to organize them into groups having identical key values:

$keyProps = @('First_Name', 'Last_name', 'DOB')
$disarray | 
    Group-Object -Property $keyProps | 
    Where-Object Count -gt 1 |

Then merge the objects, adding any missing properties to the output $mergedObject

    ForEach-Object {
        $mergedObject = $_.group[0]
        foreach ($obj in $_.group[1..($_.group.count-1)]) {
            $newProps = ($obj | Get-Member -MemberType NoteProperty).name | 
                Where-Object {
                    $_ -notin ($mergedobject | Get-Member -MemberType NoteProperty).name
                } 
            foreach ($propName in $newProps) {
                $mergedObject | Add-Member -MemberType NoteProperty -Name $propName -Value $obj.$propName -Force
            }
        }
        Write-Output $mergedObject
    }

This doesn't differ wildly from the answers you already have, but eliminating the "left" "right" distinction might be helpful; The above code should handle three or more sources thrown into $disarray, merging all objects containing identical $keyProps.

Note that there are corner cases to consider. For instance, what happens if one object has 'City=Chigago' for a user and another has 'City=New York'?

veefu
  • 2,820
  • 1
  • 19
  • 29
  • Is there a way to do it that it would return both values in separate rows? like one with Chicago and one with New York? I am most in if File A has three rows for the same person and different filenames, but File B only has the name once. how do I return all three rows? – moore1emu Sep 11 '18 at 15:28
0

adding answer i found:

$left = Import-Csv .\left.csv
$right = Import-Csv .\right.csv

$right | foreach { 
    $r = $_; 
    $left | where{ $_.First_Name -eq $r.First_Name -and $_.Last_Name -eq $r.Last_Name -and $_.DOB -eq $r.DOB } | 
        select Ref_Id, 
            First_Name, 
            Last_Name, 
            DOB, 
            @{Name="City";Expression={$r.City}}, 
            @{Name="Document_Type";Expression={$r.Document_Type}}, 
            @{Name="FileName";Expression={$r.FileName}}
} | format-table
moore1emu
  • 476
  • 8
  • 27
  • This is an easy answer to write and understandr, but the run-time complexity will be bad with large datasets. The code iterated over all of `$left` elements for every element of `$right`, resulting in N^2 complexity when right and left are the same size, N. – veefu Sep 13 '18 at 20:07
  • @veefu, that was one of my thoughts, what would you consider a large dataset in powershell? Couple hundred rows or a couple thousand rows? Do you have a suggestion for something clean and simple like this that would not have that problem? I like this solution because if i have multiple records for the same user with different file names and only one record with the ref_id it still matches up and outputs each row with the correct ref_id and the different file names. – moore1emu Sep 14 '18 at 13:49
  • I can't put a hard number on what a "large" data set is; it depends on the hardware. I also don't know the overhead powershell adds to space or time requirements. To be pragmatic about this, use `Measure-Command` to measure performance of your script with sample data sets. If performance falls short of requirements, replace the code with one of the other answers offered; they will all certainly perform better. What you like about this answer (the `ref_id` and `filename` stuff) can certainly be replicated within one of the more efficient answers. – veefu Sep 17 '18 at 14:44