4

So I've been assaulting the internet all day looking for a way to combine multiple CSV files. I keep running into an issue, no matter which of the 30+ PowerShell approaches I've looked up.

I'm trying to combine multiple CSV files into one, essentially in "full join" style. I need to end up with all rows and all columns from all CSVs combined, with the exception that I want to combine rows based on a common identifier. This discussion: "Merging two CSV files by shared column", does exactly what I'm looking to do with two exceptions. First it's only built for two CSVs and second it drops rows if both CSVs don't contain the "Name". I'd like to keep the row even if it's not in both CSVs and simply create blank entries where there is no data in the other CSV.

CSV1.csv

Name,Attrib1,Attrib2

VM1,111,True
VM2,222,False

CSV2.csv

Name,AttribA,Attrib1

VM1,AAA,111
VM3,CCC,333

CSV3.csv

Name,Attrib2,AttribB

VM2,False,YYY
VM3,True,ZZZ

Desired combined result:

Name,Attrib1,Attrib2,AttribA,AttribB

VM1,111,True,AAA,
VM2,222,False,,YYY
VM3,333,True,CCC,ZZZ

Anyone have any ideas on this one? If you need more info from my end just let me know.

Update: Here's my current code attempt with the SQLite shell:

$db  = Join-Path $env:TEMP 'temp.db'
$dir = "C:\Users\UserName\Downloads\CSV Combination"
$outfile = Join-Path $dir 'combined.csv'

@"
CREATE TABLE a (Name varchar(20),OS varchar(20),IP varchar(20),Contact varchar(20),Application varchar(20));
CREATE TABLE b (Name varchar(20));
CREATE TABLE c (Name varchar(20),Quiesce varchar(20));
CREATE TABLE d (Name varchar(20),NoQuiesce varchar(20));
.mode csv
.import '$((Join-Path $dir csv1.csv) -replace '\\', '\\')' a
.import '$((Join-Path $dir csv2.csv) -replace '\\', '\\')' b
.import '$((Join-Path $dir csv3.csv) -replace '\\', '\\')' c
.import '$((Join-Path $dir csv4.csv) -replace '\\', '\\')' d
SELECT a.Name,a.OS,a.IP,a.Contact,a.Application,c.Quiesce,d.NoQuiesce
FROM a
  LEFT OUTER JOIN b ON a.Name = b.Name
  LEFT OUTER JOIN c ON a.Name = c.Name
  LEFT OUTER JOIN d ON a.Name = d.Name
UNION
SELECT b.Name,a.OS,a.IP,a.Contact,a.Application,c.Quiesce,d.NoQuiesce
FROM b
  LEFT OUTER JOIN a ON a.Name = b.Name
  LEFT OUTER JOIN c ON b.Name = c.Name
  LEFT OUTER JOIN d ON c.Name = d.Name
UNION
SELECT c.Name,a.OS,a.IP,a.Contact,a.Application,c.Quiesce,d.NoQuiesce
FROM c
  LEFT OUTER JOIN a ON a.Name = c.Name
  LEFT OUTER JOIN b ON b.Name = c.Name
  LEFT OUTER JOIN d ON c.Name = d.Name;
"@ | filesystem::"C:\Users\UserName\Downloads\CSV Combination\sqlite3.exe" $db >$outfile

Remove-Item $db

This currently returns the following error message:

sqlite3.exe : Error: C:\Users\brandon.andritsch\Downloads\CSV Combination\csv1.csv line 1: expected 5 columns of data but found 6

iRon
  • 20,463
  • 10
  • 53
  • 79
user2600210
  • 41
  • 1
  • 3
  • You might use the [`[Join-Object] cmdlet`](https://www.powershellgallery.com/packages/Join) from the [PowerShell Gallery](https://www.powershellgallery.com/) for this: `Import-CSV .\CSV1.csv | Join (Import-CSV .\CSV2.csv) Hosts {$Right.$_} | Join (Import-CSV .\CSV3.csv) Hosts {$Right.$_} | Export-CSV .\combined.csv` – iRon Aug 12 '18 at 07:36
  • Does this answer your question? [In PowerShell, what's the best way to join two tables into one?](https://stackoverflow.com/questions/1848821/in-powershell-whats-the-best-way-to-join-two-tables-into-one) – TylerH May 23 '22 at 13:14

3 Answers3

2

I have created a Join-Object proxy command called Merge-Object (alias Merge) as it appeared that merging objects slightly similar to the SQL MERGE statement is often used. The default parameters for the Merge-Object command are set to: JoinType = 'Full' and Property= {{If ($Null -ne $RightIndex) {$Right.$_} Else {$Left.$_}}}}. Meaning that all the left objects are updated with the right property values and right objects that do not exist in left object list are added to the result:

Import-Csv CSV1.csv | 
Merge (Import-Csv CSV2.csv) -On Name | 
Merge (Import-Csv CSV3.csv) -On Name |
Format-Table

Result:

Name Attrib1 Attrib2 AttribA AttribB
---- ------- ------- ------- -------
VM1  111     True    AAA
VM2  222     False           YYY
VM3  333     True    CCC     ZZZ
iRon
  • 20,463
  • 10
  • 53
  • 79
1

you could use a great commandline tool: Miller (http://johnkerl.org/miller/doc/).

With

mlr --csv reshape -r "^A" -o item,value then reshape -s item,value \
then unsparsify --fill-with ""  *.csv

You have

Name,Attrib1,Attrib2,AttribA,AttribB
VM1,111,True,AAA,
VM2,222,False,,YYY
VM3,333,True,CCC,ZZZ

As input I have used these 3 files

Name,Attrib1,Attrib2
VM1,111,True
VM2,222,False

Name,Attrib2,AttribB
VM2,False,YYY
VM3,True,ZZZ

Name,AttribA,Attrib1
VM1,AAA,111
VM3,CCC,333

There is also the win exe https://github.com/johnkerl/miller/releases

Some notes about the command:

  • reshape -r "^A" -o item,value, to transform the input CSVs from wide to long, applying this to all the fields whose name begins with "A";
  • reshape -s item,value, to transform the previous output from long to wide;
  • unsparsify --fill-with "", to manage field names over all input records. For field names absent in a given record but present in others, fills in the value "".
aborruso
  • 4,938
  • 3
  • 23
  • 40
  • can you please explain more about the miller command that you wrote? What is this command doing? – Fariman Kashani Jun 02 '22 at 07:12
  • hi @FarimanKashani, the command answers the question above. I have added some notes at the end of my reply. What do you want to know? Are you trying to use it and it doesn't work for you? – aborruso Jun 02 '22 at 08:07
  • 1
    Thanks @aborruso the notes you put answered my questions. I used it and it worked. Just wanted to know what I'm doing exactly so I can explain to my teammates. – Fariman Kashani Jun 02 '22 at 08:12
0

Try this:

$db  = Join-Path $env:TEMP 'temp.db'
$dir = "C:\some\folder"
$outfile = Join-Path $dir 'combined.csv'

@"
CREATE TABLE a (Name varchar(20),Attrib1 varchar(20),Attrib2 varchar(20));
CREATE TABLE b (Name varchar(20),AttribA varchar(20),Attrib1 varchar(20));
CREATE TABLE c (Name varchar(20),Attrib2 varchar(20),AttribB varchar(20));
.mode csv
.import '$((Join-Path $dir csv1.csv) -replace '\\', '\\')' a
.import '$((Join-Path $dir csv2.csv) -replace '\\', '\\')' b
.import '$((Join-Path $dir csv3.csv) -replace '\\', '\\')' c
SELECT a.Name,a.Attrib1,a.Attrib2,b.AttribA,c.AttribB
FROM a
  LEFT OUTER JOIN b ON a.Name = b.Name
  LEFT OUTER JOIN c ON a.Name = c.Name
UNION
SELECT b.Name,a.Attrib1,a.Attrib2,b.AttribA,c.AttribB
FROM b
  LEFT OUTER JOIN a ON a.Name = b.Name
  LEFT OUTER JOIN c ON b.Name = c.Name
UNION
SELECT c.Name,a.Attrib1,a.Attrib2,b.AttribA,c.AttribB
FROM c
  LEFT OUTER JOIN a ON a.Name = c.Name
  LEFT OUTER JOIN b ON b.Name = c.Name;
"@ | sqlite3 $db >$outfile

Remove-Item $db

You need the SQLite command-line shell for this.

Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328
  • Hey Ansgar, thanks for the script on this, I don't know that I'll be able to get SQLite setup, I don't have the permissions to actually place it in the System32 dir. I was able to get it to run (I think) using local profile directory, here's the error I had returned to me: sqlite3.exe : Error: unknown command or invalid arguments: "import". Enter ".help" for help – user2600210 Jul 19 '13 at 18:45
  • The command-line shell is a Zip archive with a single executable. Simply extract it to a directory in your `%PATH%`. As for the error: the command is `.import`, not `import`. – Ansgar Wiechers Jul 19 '13 at 18:51
  • ``.import $((Join-Path $dir csv1.csv) -replace '\\', '\\') a`` ``.import $((Join-Path $dir csv2.csv) -replace '\\', '\\') b`` ``.import $((Join-Path $dir csv3.csv) -replace '\\', '\\') c`` These were the commands I used - I'm not sure what %PATH% refers to, I don't see a variable following that and when it's located in the current working directory it errors with the following: The term 'sqlite3.exe' is not recognized as the name of a cmdlet, function, script file, or operable program. Also is there no way to format this comment text? This is awful looking. – user2600210 Jul 19 '13 at 18:58
  • Ok, I've updated with what I'm seeing. I added a fourth CSV into the mix. One of the concerns I'd have with this method is I'll need to build a custom script whenever the attributes or CSVs change. This will be something I may need to do often and in different ways. I also added in a link to a Discussion I had mentioned that works correctly minus the two details. – user2600210 Jul 19 '13 at 19:28
  • Also, I noticed that I butchered that code in the middle and skipped a portion when I rewrote. I think I get the idea though, I'll see if I can manage to unbutcher that. (I'm not very familiar with coding, I'm in my early learning phase haha). – user2600210 Jul 19 '13 at 19:53
  • Your `$dir` contains a space, so you need to put the path in the `.import` command between quotes (both single and double quotes will work). See updated answer. – Ansgar Wiechers Jul 19 '13 at 20:00
  • Thanks for the update, I completely missed that. This works, but I now get the following error: `sqlite3.exe : Error: C:\\Users\\brandon.andritsch\\Downloads\\CSV Combination\\csv1.csv line 1: expected 5 columns of data but found 6` – user2600210 Jul 19 '13 at 20:06
  • The error message is quite self-explanatory. You declared a table with only 5 fields, but your CSV has 6 columns. The fields of the tables must match the respective fields of the CSVs (number *and* names). – Ansgar Wiechers Jul 19 '13 at 20:11
  • Unfortunately it won't let me upvote you, the answer is useful, and I was finally able to get information back from the script just never combined - it always comes back separated with multiple rows per "Name", and with added quotations on some entries. Also oddly it doesn't seem to read as a CSV no matter what I do. In any case, I'm lost on this language and wouldn't be able to modify it to suit my needs without help so it's probably best if I move back into the PowerShell realm. Thanks again for the help though, your skill is beyond mine. =) – user2600210 Jul 22 '13 at 12:44