How to merge 2 x CSVs with the same column but overwrite not append?

Question

I've got this one that has been baffling me all day, and I can't seem to find any search results that match exactly what I am trying to do.

I have 2 CSV files, both of which have the same columns and headers. They look like this (shortened for the purpose of this post):

"plate","labid","well"
"1013740016604537004556","none46","F006"
"1013740016604537004556","none47","G006"
"1013740016604537004556","none48","H006"
"1013740016604537004556","3835265","A007"
"1013740016604537004556","3835269","B007"
"1013740016604537004556","3835271","C007"

Each of the 2 CSVs only have some actual Lab IDs, and the 'nonexx' are just fillers for the importing software. There is no duplication ie each 'well' is only referenced once across the 2 files.

What I need to do is merge the 2 CSVs, for example the second CSV might have a Lab ID for well H006 but the first will not. I need the lab ID from the second CSV imported into the first, overwriting the 'nonexx' currently in that column.

Here is my current code:


            $CSVB = Import-CSV "$RootDir\SymphonyOutputPending\$plateID`A_Header.csv"
            Import-CSV "$RootDir\SymphonyOutputPending\$plateID`_Header.csv" | ForEach-Object {
                $CSVData = [PSCustomObject]@{
                    labid = $_.labid
                    well = $_.well
                }
                
                If ($CSVB.well -match $CSVData.wellID) {
                    write-host "I MATCH"
                    ($CSVB | Where-Object {$_.well -eq $CSVData.well}).labid = $CSVData.labid
            }
            $CSVB | Export-CSV "$RootDir\SymphonyOutputPending\$plateID`_final.csv" -NoTypeInformation
        }

The code runs but doesn't 'merge' the data, the final CSV output is just a replication of the first input file. I am definitely getting a match as the string "I MATCH" appears several times when debugging as expected.

Could you share a basic example of the 2nd Csv and how the outcome of merging both would look like ? — Santiago Squarzon, Aug 23 '22 at 02:38
The first and second CSVs look exactly the same (as above), the only difference is one might have a valid Lab ID for well G006 and the other for H006. Trying to keep it short the files are generated because a plate is run through an instrument more than once, but isn't capable of updating the existing file relating to the plate. I hope that makes sense. — BrettFK, Aug 23 '22 at 02:45
So both Csv can have "none" values, meaning you need to compare them 2 times (2 loops), 1 for Csv1 against Csv2 and vice-versa. Is that right? — Santiago Squarzon, Aug 23 '22 at 02:50
That's what I think but am not sure. So I want to take CSV1, then look at CSV2, merging only from CSV2 to CSV1 if the CSV2 cell not contains 'none' (not part of the code yet). CSV1 will then have valid Lab IDs where a valid ID exists regardless of from CSV1 or CSV2 and anything containing 'none' gets either overwritten with a lab ID or is left untouched. — BrettFK, Aug 23 '22 at 02:56
Am I right to assume that csv1 and csv2 will have identical lines except for labid? And you want to copy the labid from csv2 to csv1 if csv1.labid like "none*"? And either csv1 or csv2 will have the labid for each well/plate? — Drew, Aug 23 '22 at 02:58

score 2 · Accepted Answer · answered Aug 23 '22 at 03:13

Based on the responses in the comments of your question, I believe this is what you are looking for. This assumes that the both CSVs contain the exact same data with labid being the only difference.

There is no need to modify csv2 if we are just grabbing the labid to overwrite the row in csv1.

$csv1 = Import-Csv C:\temp\LabCSV1.csv
$csv2 = Import-Csv C:\temp\LabCSV2.csv

# Loop through csv1 rows
Foreach($line in $csv1) {
    # If Labid contains "none"
    If($line.labid -like "none*") {
        # Set rows labid to the labid from csv2 row that matches plate/well
        # May be able to remove the plate section if well is a unique value
        $line.labid = ($csv2 | Where {$_.well -eq $line.well -and $_.plate -eq $line.plate}).labid
    }
}
# Export to CSV - not overwrite - to confirm results
$csv1 | export-csv C:\Temp\LabCSV1Adjusted.csv -NoTypeInformation

Santiago Squarzon · Answer 2 · 2022-08-23T03:41:02.107

Since you need to do a bi-directional comparison of the 2 Csvs you could create a new array of both and then group the objects by their well property, for this you can use Group-Object, then filter each group if their Count is equal to 2 where their labid property does not start with none else return the object as-is.

Using the following Csvs for demonstration purposes:

Csv1

"plate","labid","well"
"1013740016604537004556","none46","F006"
"1013740016604537004556","none47","G006"
"1013740016604537004556","3835265","A007"
"newrowuniquecsv1","none123","X001"

Csv2

"plate","labid","well"
"1013740016604537004556","none48","A007"
"1013740016604537004556","3835269","F006"
"1013740016604537004556","3835271","G006"
"newrowuniquecsv2","none123","X002"

Code

Note that this code assumes there will be a maximum of 2 objects with the same well property and, if there are 2 objects with the same well, one of them must have a value not starting with none.

$mergedCsv = @(
    Import-Csv pathtocsv1.csv
    Import-Csv pathtocsv2.csv
)

$mergedCsv | Group-Object well | ForEach-Object {
    if($_.Count -eq 2) {
        return $_.Group.Where{ -not $_.labid.StartsWith('none') }
    }

    $_.Group
} | Export-Csv pathtomerged.csv -NoTypeInformation

Output

plate                  labid   well
-----                  -----   ----
1013740016604537004556 3835265 A007
1013740016604537004556 3835269 F006
1013740016604537004556 3835271 G006
newrowuniquecsv1       none123 X001
newrowuniquecsv2       none123 X002

Bit of a silly question I know, but how does one then export this to a CSV file? I'm also not seeing an output when I debug in VSCode. — BrettFK, Aug 23 '22 at 03:22
@BrettFK you can just pipe the loop to `Export-Csv` (updated my answer) — Santiago Squarzon, Aug 23 '22 at 03:25
I prefer your method as it accounts for possible unique plate/well in the dataset and it is an elegant inline. — Drew, Aug 23 '22 at 03:29
@Drew thanks, but note this method should be robust as long there are a maximum of 2 objects with the same `well` property. Based on the comments and question I would assume so. — Santiago Squarzon, Aug 23 '22 at 03:35

score 0 · Answer 3 · answered Aug 23 '22 at 07:30

If the lists are large, performance might be an issue as Where-Object (or any other where method) and Group-Object do not perform very well for embedded loops.

By indexing the second csv file (aka creating a hashtable), you have quicker access to the required objects. Indexing upon two (or more) items (plate and well) is issued here: Does there exist a designated (sub)index delimiter? and resolved by @mklement0 and zett42 with a nice CaseInsensitiveArrayEqualityComparer class.

To apply this class on Drew's helpful answer:

$csv1 = Import-Csv C:\temp\LabCSV1.csv
$csv2 = Import-Csv C:\temp\LabCSV2.csv

$dict = [hashtable]::new([CaseInsensitiveArrayEqualityComparer]::new())
$csv2.ForEach{ $dict.($_.plate, $_.well) = $_ }

Foreach($line in $csv1) {
    If($line.labid -like "none*") {
        $line.labid = $dict.($line.plate, $line.well).labid
    }
}
$csv1 | export-csv C:\Temp\LabCSV1Adjusted.csv -NoTypeInformation

How to merge 2 x CSVs with the same column but overwrite not append?

3 Answers3