-1

I'm looking for a solution to compare 2 .csv files and compare the results.

The first .csv file is an monthly backup size in KB on based client name. The second .csv file is an next monthly backup size in KB on based client name.

It lists all the Client Name in column A. Column B has the corresponding policy name of client and last column backup size in KB (i.e. - 487402463).

If the difference between client size (1638838488 - 1238838488 = 0.37 in TB ) is greater than 0.10 TB , the results will be spit out in TB size to a csv file like below.

Also , a client may be related multiple policy name.

My question is : I want to add something too. Sometimes it may be duplicate client and policy name such as hostnameXX,Company_Policy_XXX or case-sensitive HOSTNAMEXX,Company_Policy_XXX. additionally, lets say , if hostnameYY,Company_Policy_XXX,41806794 does not exist in CSV2 then I want to display as negative like below.

I used the Join-Object module.

Example CSVFile1.csv

Client Name,Policy Name,KB Size
hostname1,Company_Policy,487402463
hostname2,Company_Policy,227850336
hostname3,Company_Policy_11,8360960
hostname4,Company_Policy_11,1238838488
hostname1,Company_Policy_55,521423110
hostname10,Company_Policy,28508975
hostname3,Company_Policy_66,295925
hostname5,Company_Policy_22,82001824
hostname2,Company_Policy_33,26176885
hostnameXX,Company_Policy_XXX,0
hostnameXX,Company_Policy_XXX,41806794
hostnameYY,Company_Policy_XXX,41806794

Example CSVFile2.csv

Client Name,Policy Name,KB Size
hostname1,Company_Policy,487402555
hostname2,Company_Policy,227850666
hostname3,Company_Policy_11,8361200
hostname4,Company_Policy_11,1638838488
hostname1,Company_Policy_55,621423110
hostname10,Company_Policy,28908975
hostname3,Company_Policy_66,295928
hostname5,Company_Policy_22,92001824
hostname2,Company_Policy_33,36176885
hostname22,Company_Policy,291768854
hostname23,Company_Policy,291768854

Desired Output :

Client Name,Policy Name,TB Size
hostname4,Company_Policy_11,0.37
hostname22,Company_Policy,0.27
hostname23,Company_Policy,0.27
hostnameYY,Company_Policy_XXX,-0.03
hostnameXX,Company_Policy_XXX,-0.04
Arbelac
  • 1,698
  • 6
  • 37
  • 90

2 Answers2

1

Using this Join-Object cmdlet (see also: what's the best way to join two tables into one?):

$CSV2 | FullJoin $CSV1 `
    -On 'Client Name','Policy Name' `
    -Property 'Client Name',
              'Policy Name', 
              @{'TB Size' = {[math]::Round(($Left.'KB Size' - $Right.'KB Size') * 1KB / 1TB, 2)}} | 
    Where-Object  {[math]::Abs($_.'TB Size') -gt 0.01}

Result:

Client Name Policy Name        TB Size
----------- -----------        -------
hostname4   Company_Policy_11    -0.37
hostname1   Company_Policy_55    -0.09
hostnameXX  Company_Policy_XXX    0.04
hostnameYY  Company_Policy_XXX    0.04
hostname22  Company_Policy       -0.27
hostname23  Company_Policy       -0.27

Update 2019-11-24

Improved -Where parameter which will now also apply to outer joins.
You can now use the -Where parameter instead of the Where-Object cmdlet for these type of queries, e.g.:

$Actual = $CSV2 | FullJoin $CSV1 `
    -On 'Client Name','Policy Name' `
    -Property 'Client Name',
              'Policy Name', 
              @{'TB Size' = {[math]::Round(($Left.'KB Size' - $Right.'KB Size') / 1GB, 2)}} `
    -Where {[math]::Abs($Left.'KB Size' - $Right.'KB Size') -gt 100MB}

The advantage of using the -Where parameter is that there is a slight performance improvement as some output objects aren't required to be created at all.

Note 1: The -Where parameter applies to the $Left and $Right objects that represent respectively each $LeftInput and $RightInput object and not the Output Object. In other words you can't use e.g. the calculated TB Size property in the -Where expression for this example.

Note 2: The $Right object always exists in a Left Join or full join even if there is no relation. In case there is no relation, all properties of the $Right object will be set to $Null. The same applies to the $Left object in a right join or full join.

iRon
  • 20,463
  • 10
  • 53
  • 79
  • Thanks But how can I integrate my script your solution ? – Arbelac Nov 14 '19 at 11:18
  • ili101's script is just a different cmdlet/solution. The installation of my version is quiet simple: download the script from https://www.powershellgallery.com/packages/Join. Simply [dotsource](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_scripts?view=powershell-6#script-scope-and-dot-sourcing) the script like: `. .\Join.ps1` to invoke it and run the command as displayed above... – iRon Nov 14 '19 at 12:52
  • Ok I got it. Also , I need to export to CSV file. – Arbelac Nov 14 '19 at 13:01
  • No problem, just append something like: `... | Export-Csv C:\Toolbox\DataReport.csv -NoTypeInformation` – iRon Nov 14 '19 at 13:17
  • I noticed a possibility too. I will update my question. – Arbelac Nov 14 '19 at 14:18
  • Invoking the cmdlet using [dot sourcing](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_scripts?view=powershell-6#script-scope-and-dot-sourcing) (e.g. `. .\Jpin.ps1`) will just make commands available like `Fulljoin`. See [readme](https://github.com/iRon7/Join-Object/blob/master/README.md) for more (proxy) commands. – iRon Nov 14 '19 at 14:20
  • There are some shortcomings. I have posted as different question. can you please assist me ? https://stackoverflow.com/questions/58864302/join-object-two-different-csv-files-using-powershell – Arbelac Nov 14 '19 at 19:23
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/202362/discussion-between-iron-and-arbelac). – iRon Nov 14 '19 at 19:36
-1

I have never used the Join-Object module, so I wrote it using standard cmdlets.

$data1 = Import-Csv "CSVFile1.csv"
$data1 | ForEach-Object { $_."KB Size" = -1 * $_."KB Size" } # Convert to negative value

$data2 = Import-Csv "CSVFile2.csv"

@($data2; $data1) | Group-Object "Client Name","Policy Name" | ForEach-Object {
    $size = [Math]::Round(($_.Group | Measure-Object "KB Size" -Sum).Sum * 1KB / 1TB, 2)
    if ($size -ge 0 -and $size -lt 0.1) { return }
    [pscustomobject]@{
        "Client Name" = $_.Group[0]."Client Name"
        "Policy Name" = $_.Group[0]."Policy Name"
        "TB Size" = $size
    }
}
rokumaru
  • 1,244
  • 1
  • 8
  • 11