Compare columns between 2 files and delete non common columns using Powershell

Question

I have a bunch of files in folder A and their corresponding metadata files in folder B. I want to loop though the data files and check if the columns are the same in the metadata file, (since incoming data files could have new columns added at any position without notice). If the columns in both files match, no action to is to be taken. If Data file has more columns than metadata file, then those columns should be deleted from incoming data file. Any help would be appreciated. Thanks!

Data file is ps_job.txt

“1”|”Tom”|”10″|”11111″|”1000″|”M”

“2”|”Ann”|”20″|”22222″|”2000″|”F”

Meta data file is ps_job_metadata.dat

“empid”|”name”|”zipcode”|”salary”

I would like my output to be

“empid”|”name”|”zipcode”|”salary”

“1”|”Tom”|”11111″|”1000″

“2”|”Ann”|”22222″|”2000″

Welcome to SO. To set your expectations you should take the [Tour] and read at least the help topic [Ask] and Please, [format you code, errors and sample input/output properly](http://meta.stackexchange.com/a/22189/248777). — Olaf, Jul 31 '20 at 20:08
When you crosspost the same question at the same time to different forums you should at least post links to the other forums along with your question to avoid people willing to you help making their work twice or more.Thanks [Powershell.org](https://powershell.org/forums/topic/compare-columns-between-2-files-and-delete-non-common-columns-and-data/) — Olaf, Jul 31 '20 at 20:10

score 0 · Answer 1 · answered Jul 31 '20 at 23:11

That's a seemingly simple question with a very complicated answer. However, I've broken down the code for what you will need to do. Here are the steps that need to happen in order for powershell to do everything you're asking of it.

Read the .dat file
Save the .dat data into an object
Read the .txt file
Save the .txt header into an object
Check for the differences
Delete the old text file (that had too many columns)
Create a new text file with the new columns

I've made some assumptions in how this looks. However, with the way I've structured the code, it should be easy enough to make modifications as necessary if my assumptions are wrong. Here are my assumptions:

The text file will always have all of the columns that the DAT file has (even though it will sometimes have more)
The dat file is structured like a text file and can be directly imported into powershell.

And here is the code, with comments. I've done my best to explain the purpose of each section, but I've written this with the expectation that you have a basic knowledge of powershell, especially arrays. If you have questions I'll do my best to answer, though I'll ask that you refer to the section of code you have questions on.


### 
### The paths. I'm sure you will have multiples of each file. However, I didn't want to attempt to pull in 
### the files with this sample code as it can vary so much in your environment. 
###

$dat = "C:\StackOverflow\thingy.dat"
$txt = "C:\stackoverflow\ps_job.txt"


###
### This is the section to process the DAT file
###


# This will read the file and put it in a variable

$dat_raw = get-content -Path $dat

# Now, let's seperate out the punctuation and give us our object 

$dat_array = $dat_raw.split("|")
$dat_object = @()

foreach ($thing in $dat_array)
{
    $dat_object+=$thing.Replace("""","")
}

###
### This is the section to process the TXT file
###

# This will read the file and put it into a variable

$txt_raw = get-content -Path $txt

# Now, let's seperate out the punctuation and give us our object

$txt_header_array = $txt_raw[0].split("|")
$txt_header_object = @()

foreach ($thing in $txt_header_array)
{
    $txt_header_object += $thing.Replace("""","")
}

###
### Now, let's figure out which columns we're eliminating (if  any)
###

$x = 0
$total = $txt_header_object.count
$to_keep = @()

While ($x -le $total)
{
    if ($dat_object -contains $txt_header_object[$x])
    {
        $to_keep += $x 
    }

    $x++
}

### Now that we know which objects to keep, we can apply the changes to each line of the text file. 
### We will save each line to a new variable. Then, once we have the new variable, we will delete 
### The existing file with a new file that has only the data we want.Note, we will only run this 
### Code if there's a difference in the files. 

if ($total -ne $to_keep.count)
{
    
    ### This first section will go line by line and 'fix' the number of columns
    
    $new_text_file = @()

    foreach ($line in $txt_raw)
    {
        if ($line.Length -gt 0)
        {
            # Blank out the array each time
            $line_array = @()

            foreach ($number in $to_keep)
            {
                $line_array += ($line.split("|"))[$number]
            }

            $new_text_file += $line_array -join "|"
        }
        else
        {
            $new_text_file +=""
        }
    }

    ### This second section will delete the original file and replace it with our good
    ### file that has been created. 

    Remove-item -Path $txt

    $new_text_file | out-file -FilePath $txt

}

Thanks for the response. I will try this over the weekend. – Syd885264 Aug 01 '20 at 00:33 — Syd885264, Aug 01 '20 at 00:33

score 0 · Answer 2 · answered Jul 31 '20 at 23:21

This small example can be a start for your solution :

$ps_job = Import-Csv D:\ps_job.txt -Delimiter '|'
$ps_job_metadata = (Get-Content D:\ps_job_metadata.txt) -split '\|'-replace '"'

foreach( $d in (Compare-Object $column $ps_job_metadata))
{
    if($d.SideIndicator -eq '<=')
    {
        $ps_job | %{ $_.psobject.Properties.Remove($d.InputObject) }
    }
}
$ps_job | Export-Csv -Path D:\output.txt -Delimiter '|' -NoTypeInformation

score 0 · Answer 3 · edited Aug 01 '20 at 11:08

I tried this and it works.

$outputFile = "C:\Script_test\ps_job_mod.dat"

$sample = Import-Csv -Path "C:\Script_test\ps_job.dat" -Delimiter '|'

$metadataLine = Get-Content -Path "C:\Script_test\ps_job_metadata.txt" -First 1

$desiredColumns = $metadataLine.Split("|").Replace("`"","")

$sample | select $desiredColumns | Export-Csv $outputFile -Encoding UTF8 -NoTypeInformation -Delimiter '|'

Doug Maurer · Answer 4 · 2020-08-01T17:27:54.217

Here's the same answer I posted to your question on Powershell.org

$jobfile    = "ps_job.dat"
$metafile   = "ps_job_metadata.dat"
$outputfile = "some_file.csv"

$meta = ((Get-Content $metafile -First 1 -Encoding UTF8) -split '\|')

Class ColumnSelector : System.Collections.Specialized.OrderedDictionary {

    Select($line,$meta)
    {
        $meta | foreach{$this.add($_,(iex "`$line.$_"))}
    }

    ColumnSelector($line,$meta)
    {
        $this.select($line,$meta)
    }
}

import-csv $jobfile -Delimiter '|' | 
    foreach{[pscustomobject]([columnselector]::new($_,$meta))} | 
        Export-CSV $outputfile -Encoding UTF8 -NoTypeInformation -Delimiter '|'

Output

PS C:\>Get-Content $outputfile

"empid"|"name"|"zipcode"|"salary"
"1"|"Tom"|"11111"|"1000"
"2"|"Ann"|"22222"|"2000"

iRon · Answer 5 · 2020-08-01T12:02:30.513

Please note that the smart quotes are in consistent over the rows and there are empty lines between the rows (I highly recommend to reformat/update your question).
Anyways, as long as the quoting of the header is consistent between the two (ps_job.txt and ps_job_metadata.dat) files:

# $JobTxt = Get-Content .\ps_job.txt
$JobTxt = @'
“empid”|”name”|”deptid”|”zipcode”|”salary”|”gender”
“1”|”Tom”|”10″|”11111″|”1000″|”M”
“2”|”Ann”|”20″|”22222″|”2000″|”F”
'@

# $MetaDataTxt = Get-Content .\ps_job_metadata.dat
$MetaDataTxt = @'
“empid”|”name”|”zipcode”|”salary”
'@

$Job = ConvertFrom-Csv -Delimiter '|' $JobTxt
$MetaData = ConvertFrom-Csv -Delimiter '|' (@($MetaDataTxt) + 'x|')

$Job | Select-Object $MetaData.PSObject.Properties.Name

“empid” ”name” ”zipcode” ”salary”
------- ------ --------- --------
“1”     ”Tom”  ”11111″   ”1000″
“2”     ”Ann”  ”22222″   ”2000″

score 0 · Answer 6 · answered Aug 03 '20 at 13:57

Provided you want to keep those curly quotes and your code page and console font supports all the characters, you can do the following:

# Create array of properties delimited by |
$headers = (Get-Content .\ps_job_metadata.dat -Encoding UTF8) -split '\|'
Import-Csv ps_job.dat -Delimiter '|' -Encoding utf8 | Select-Object $headers

Compare columns between 2 files and delete non common columns using Powershell

6 Answers6