1

I have a question similar to this one but with a twist: Powershell Group Object in CSV and exporting it

My file has 42 existing headers. The delimiter is a standard comma, and there are no quotation marks in this file.

master_account_number,sub,txn,cur,last,first,address,address2,city,state,zip,ssn,credit,email,phone,cell,workphn,dob,chrgnum,cred,max,allow,neg,plan,downpayment,pmt2,min,clid,cliname,owner,merch,legal,is_active,apply,ag,offer,settle_perc,min_pay,plan2,lstpmt,orig,placedate

The file's data (the first 6 columns) looks like this:

master_account_number,sub,txn,cur,last,first
001,12,35,50.25,BIRD, BIG
001,34,47,100.10,BIRD, BIG
002,56,9,10.50,BUNNY, BUGS
002,78,3,20,BUNNY, BUGS
003,54,7,250,DUCK, DAFFY
004,44,88,25,MOUSE, JERRY

I am only working with the first column master_account_number and the 4th column cur. I want to check for duplicates of the"master_account_number" column, if found then add the totals up from the 4th column "cur" for only those dupes found and then do a combine for any rows that we just did a sum on. The summed value from the dupes should replace the cur value in our combined row.

With that said, our out-put should look like so.

master_account_number,sub,txn,cur,last,first
001,12,35,150.35,BIRD, BIG
002,56,9,30.50,BUNNY, BUGS
003,54,7,250,DUCK, DAFFY
004,44,88,25,MOUSE, JERRY

Now that we have that out the way, here is how this question differs. I want to keep all 42 columns intact in the out-put file. In the other question I referenced above, the input was 5 columns and the out-put was 4 columns and this is not what I'm trying to achieve. I have so many more headers, I'd hate to have specify individually all 42 columns. That seems inefficient anyhow.

As for what I have so far for code... not much.

$revNB = "\\server\path\example.csv"
$global:revCSV = import-csv -Path $revNB | ? {$_.is_active -eq "Y"}
$dupesGrouped = $revCSV | Group-Object master_account_number | Select-Object @{Expression={ ($_.Group|Measure-Object cur -Sum).Sum }}

Ultimately I want the output to look identical to the input, only the output should merge duplicate account numbers rows, and add all the "cur" values, where the merged row contains the sum of the grouped cur values, in the cur field.

Last Update: Tried Rich's solution and got an error. Modified what he had to this $dupesGrouped = $revCSV | Group-Object master_account_number | Select-Object Name, @{Name='curSum'; Expression={ ($_.Group | Measure-Object cur -Sum).Sum}} And this gets me exactly what my own code got me so I am still looking for a solution. I need to output this CSV with all 42 headers. Even for items with no duplicates.

Other things I've tried: This doesn't give me the data I need in the columns, the columns are there but they are blank.

$dupesGrouped = $revCSV | Group-Object master_account_number | Select-Object @{ expression={$_.Name}; label='master_account_number' },
sub_account_number,
charge_txn,
@{Name='current_balance'; Expression={ ($_.Group | Measure-Object current_balance -Sum).Sum },
last,
}
shadow2020
  • 1,315
  • 1
  • 8
  • 30

2 Answers2

1

You're pretty close, but you used current_balance where you probably meant cur.

Here's a start:

$dupesGrouped = $revCSV | Group-Object master_account_number |  
Select-Object Name, @{N='curSum'; E={ ($_.Group | Measure-Object cur -Sum).Sum}, 
@{N='last'; E={ ($_.Group | Select-Object last -first 1).last} }

You can add the other fields by adding Name;Expression hashtables for each of the fields you want to summarize. I assumed you would want to select the first occurrence of repeated last name for the same master_account_number. The output will be incorrect if the last name differs for the same master_account_number.

Rich Moss
  • 2,195
  • 1
  • 13
  • 18
  • Looks close but doesn't work due to error: `Select-Object : The "E" key has a type, System.Object[], that is not valid; expected types are {System.String, System.Management.Automation.ScriptBlock}.` – shadow2020 Jul 23 '19 at 20:57
  • Also I don't really care which of the dupe lines is kept but I want the "curSum" to go into the "cur" field after they are merged. After that I'm doing no more grouping. – shadow2020 Jul 23 '19 at 21:31
  • 1
    I'm not able to reproduce your error. What is your Powershell version? `$PSVersionTable` – Rich Moss Jul 23 '19 at 22:52
  • It's version 5.1 – shadow2020 Jul 23 '19 at 23:11
  • 1
    Since the structure of the data has changed after grouping, you have to select from the group. Each of the 42 columns will need it's own N/E block: `@{N='sub'; E={ ($_.Group | Select-Object sub -first 1).sub} }, @{N='txn'; E={ ($_.Group | Select-Object txn -first 1).txn} }` There may be an easier way, but I don't know it. – Rich Moss Jul 23 '19 at 23:13
  • So I did try that but I'm getting no data. I get the columns but they're empty. – shadow2020 Jul 23 '19 at 23:19
0

In the case of changing only part of the data, there is also the following way.

$dupesGrouped = $revCSV | Group-Object master_account_number | ForEach-Object {

    # copy the first data in order not to change original data
    $new = $_.Group[0].psobject.Copy()    

    # update the value of cur property
    $new.cur = ($_.Group | Measure-Object cur -Sum).Sum

    # output
    $new                                  
}
rokumaru
  • 1,244
  • 1
  • 8
  • 11