0

I need to combine a slew of Excel spreadsheets. I used PowerSHell to convert them to CSVs and now need to merge them, but not as you typically would. The merge doesn't use a join. If I have 3 files with 100 rows each, my new file should have 300 rows. So, this is more if a UNION than a JOIN to use database terms.

Some of the columns do have the same name. Some don't. If they have the same name, a new column shouldn't be created. Is there a way to do this without manually having to list out all the columns as properties?

Example (with only 2 files)

File1:

Name Address 
Bob  123 Main

File2:

Name City
Bob  LA
Tom  Boston

Results

Name  Address City
Bob   123 Main
Bob           LA
Tom           Boston
Matt
  • 45,022
  • 8
  • 78
  • 119
user1612851
  • 1,144
  • 4
  • 18
  • 32
  • How do you know which properties to keep? Just the ones shared by all files? Have you tried anything for this yet? Answering the first question is the answer to this issue. – Matt Oct 07 '15 at 18:58
  • I want all properties from all files. I tried using a join-object function I found online and doing a full join, but it didn't work. – user1612851 Oct 07 '15 at 19:01
  • Is there something you have tried for this.... even the smallest effort would look good in the question. – Matt Oct 07 '15 at 19:11
  • I wrote the code to convert the excel files to csvs and to cycle through and import the csvs. I have no clue how to do the merging yet. I've done this before with joining files, but never like this. – user1612851 Oct 07 '15 at 19:13
  • You can simply concatenate the files: `$File3 = $File1, $File2` and $File3 will contain all the objects (including the properties, prove: `$File3[1].City`) but if you pipe `$File3`, most cmdlets only look to the first object to define the concerned properties (including piping it to the output!). Therefore you will need to Union `$File3 = $File1, $File2 | Union-Object`, see: https://stackoverflow.com/a/44429084/1701026 – iRon Feb 09 '19 at 16:10
  • Possible duplicate of [Not all properties displayed](https://stackoverflow.com/questions/44428189/not-all-properties-displayed) – iRon Feb 09 '19 at 16:11

1 Answers1

0

At the end of the day this might not be sorted right. The trick here is to read the header of each file and collect it as a string array and remove and of the duplicates.

This code assumes all the files are in the same location. If not you will need to account for that.

$files = Get-ChildItem -Path 'C:\temp\csv\' -Filter '*.csv' | Select-Object -ExpandProperty FullName

# Gather the headers for all the files. 
$headers = $files | ForEach-Object{
    (Get-Content $_ -Head 1).Split(",") | ForEach-Object{$_.Trim()}
} | Sort-Object -Unique

# Loop again now and read in the csv files as objects
$files | ForEach-Object{
    Import-Csv $_
} | Select-Object $headers 

The output would look like this:

Address  City   Name
-------  ----   ----
123 Main        Bob 
         LA     Bob 
         Boston Tom 
Matt
  • 45,022
  • 8
  • 78
  • 119
  • I will give it a try. I was thinking that was the right strategy of reading all the headers first. – user1612851 Oct 07 '15 at 19:34
  • I think this works! Thanks. Can you think of an easy way to combine columns with different names using this technique? For example, if the column name is Name, Nm, or Fullname put those all in a column called name. I know I can do a Select at the end and use expressions. Not sure if that would be the best way. – user1612851 Oct 07 '15 at 19:42
  • @user1921849 I have an idea of a way but it is not in the scope of this question. You can ask a new question if you want about it and reference this one. One way would be to change the header of the file and resave based on known variances. I can think of another dynamic one but again way out of scope for this question. – Matt Oct 07 '15 at 20:05
  • Thanks. I figured it out. I just add whatever expression to $headers and remove the items I don't need. Like $headers+= @{Name="Extra";Expression={$_."Business Line"}} – user1612851 Oct 07 '15 at 20:09