-3

I have a bunch of files in folder A and their corresponding metadata files in folder B. I want to loop though the data files and check if the columns are the same in the metadata file, (since incoming data files could have new columns added at any position without notice). If the columns in both files match, no action to is to be taken. If Data file has more columns than metadata file, then those columns should be deleted from incoming data file. Any help would be appreciated. Thanks!

Data file is ps_job.txt

“empid”|”name”|”deptid”|”zipcode”|”salary”|”gender”

“1”|”Tom”|”10″|”11111″|”1000″|”M”

“2”|”Ann”|”20″|”22222″|”2000″|”F”

Meta data file is ps_job_metadata.dat

“empid”|”name”|”zipcode”|”salary”

I would like my output to be

“empid”|”name”|”zipcode”|”salary”

“1”|”Tom”|”11111″|”1000″

“2”|”Ann”|”22222″|”2000″

  • 2
    Welcome to SO. To set your expectations you should take the [Tour] and read at least the help topic [Ask] and Please, [format you code, errors and sample input/output properly](http://meta.stackexchange.com/a/22189/248777). – Olaf Jul 31 '20 at 20:08
  • When you crosspost the same question at the same time to different forums you should at least post links to the other forums along with your question to avoid people willing to you help making their work twice or more.Thanks [Powershell.org](https://powershell.org/forums/topic/compare-columns-between-2-files-and-delete-non-common-columns-and-data/) – Olaf Jul 31 '20 at 20:10

6 Answers6

0

That's a seemingly simple question with a very complicated answer. However, I've broken down the code for what you will need to do. Here are the steps that need to happen in order for powershell to do everything you're asking of it.

  • Read the .dat file
  • Save the .dat data into an object
  • Read the .txt file
  • Save the .txt header into an object
  • Check for the differences
  • Delete the old text file (that had too many columns)
  • Create a new text file with the new columns

I've made some assumptions in how this looks. However, with the way I've structured the code, it should be easy enough to make modifications as necessary if my assumptions are wrong. Here are my assumptions:

  • The text file will always have all of the columns that the DAT file has (even though it will sometimes have more)
  • The dat file is structured like a text file and can be directly imported into powershell.

And here is the code, with comments. I've done my best to explain the purpose of each section, but I've written this with the expectation that you have a basic knowledge of powershell, especially arrays. If you have questions I'll do my best to answer, though I'll ask that you refer to the section of code you have questions on.


### 
### The paths. I'm sure you will have multiples of each file. However, I didn't want to attempt to pull in 
### the files with this sample code as it can vary so much in your environment. 
###

$dat = "C:\StackOverflow\thingy.dat"
$txt = "C:\stackoverflow\ps_job.txt"


###
### This is the section to process the DAT file
###


# This will read the file and put it in a variable

$dat_raw = get-content -Path $dat

# Now, let's seperate out the punctuation and give us our object 

$dat_array = $dat_raw.split("|")
$dat_object = @()

foreach ($thing in $dat_array)
{
    $dat_object+=$thing.Replace("""","")
}

###
### This is the section to process the TXT file
###

# This will read the file and put it into a variable

$txt_raw = get-content -Path $txt

# Now, let's seperate out the punctuation and give us our object

$txt_header_array = $txt_raw[0].split("|")
$txt_header_object = @()

foreach ($thing in $txt_header_array)
{
    $txt_header_object += $thing.Replace("""","")
}

###
### Now, let's figure out which columns we're eliminating (if  any)
###

$x = 0
$total = $txt_header_object.count
$to_keep = @()

While ($x -le $total)
{
    if ($dat_object -contains $txt_header_object[$x])
    {
        $to_keep += $x 
    }

    $x++
}

### Now that we know which objects to keep, we can apply the changes to each line of the text file. 
### We will save each line to a new variable. Then, once we have the new variable, we will delete 
### The existing file with a new file that has only the data we want.Note, we will only run this 
### Code if there's a difference in the files. 

if ($total -ne $to_keep.count)
{
    
    ### This first section will go line by line and 'fix' the number of columns
    
    $new_text_file = @()

    foreach ($line in $txt_raw)
    {
        if ($line.Length -gt 0)
        {
            # Blank out the array each time
            $line_array = @()

            foreach ($number in $to_keep)
            {
                $line_array += ($line.split("|"))[$number]
            }

            $new_text_file += $line_array -join "|"
        }
        else
        {
            $new_text_file +=""
        }
    }

    ### This second section will delete the original file and replace it with our good
    ### file that has been created. 

    Remove-item -Path $txt

    $new_text_file | out-file -FilePath $txt

}

V3rmouth
  • 171
  • 4
0

This small example can be a start for your solution :

$ps_job = Import-Csv D:\ps_job.txt -Delimiter '|'
$ps_job_metadata = (Get-Content D:\ps_job_metadata.txt) -split '\|'-replace '"'

foreach( $d in (Compare-Object $column $ps_job_metadata))
{
    if($d.SideIndicator -eq '<=')
    {
        $ps_job | %{ $_.psobject.Properties.Remove($d.InputObject) }
    }
}
$ps_job | Export-Csv -Path D:\output.txt -Delimiter '|' -NoTypeInformation
YannCha
  • 231
  • 1
  • 4
0

I tried this and it works.

$outputFile = "C:\Script_test\ps_job_mod.dat"

$sample = Import-Csv -Path "C:\Script_test\ps_job.dat" -Delimiter '|'

$metadataLine = Get-Content -Path "C:\Script_test\ps_job_metadata.txt" -First 1

$desiredColumns = $metadataLine.Split("|").Replace("`"","")

$sample | select $desiredColumns | Export-Csv $outputFile -Encoding UTF8 -NoTypeInformation -Delimiter '|'
Doug Maurer
  • 8,090
  • 3
  • 12
  • 13
0

Here's the same answer I posted to your question on Powershell.org

$jobfile    = "ps_job.dat"
$metafile   = "ps_job_metadata.dat"
$outputfile = "some_file.csv"

$meta = ((Get-Content $metafile -First 1 -Encoding UTF8) -split '\|')

Class ColumnSelector : System.Collections.Specialized.OrderedDictionary {

    Select($line,$meta)
    {
        $meta | foreach{$this.add($_,(iex "`$line.$_"))}
    }

    ColumnSelector($line,$meta)
    {
        $this.select($line,$meta)
    }
}

import-csv $jobfile -Delimiter '|' | 
    foreach{[pscustomobject]([columnselector]::new($_,$meta))} | 
        Export-CSV $outputfile -Encoding UTF8 -NoTypeInformation -Delimiter '|'

Output

PS C:\>Get-Content $outputfile

"empid"|"name"|"zipcode"|"salary"
"1"|"Tom"|"11111"|"1000"
"2"|"Ann"|"22222"|"2000"
Doug Maurer
  • 8,090
  • 3
  • 12
  • 13
0

Please note that the smart quotes are in consistent over the rows and there are empty lines between the rows (I highly recommend to reformat/update your question).
Anyways, as long as the quoting of the header is consistent between the two (ps_job.txt and ps_job_metadata.dat) files:

# $JobTxt = Get-Content .\ps_job.txt
$JobTxt = @'
“empid”|”name”|”deptid”|”zipcode”|”salary”|”gender”
“1”|”Tom”|”10″|”11111″|”1000″|”M”
“2”|”Ann”|”20″|”22222″|”2000″|”F”
'@

# $MetaDataTxt = Get-Content .\ps_job_metadata.dat
$MetaDataTxt = @'
“empid”|”name”|”zipcode”|”salary”
'@

$Job = ConvertFrom-Csv -Delimiter '|' $JobTxt
$MetaData = ConvertFrom-Csv -Delimiter '|' (@($MetaDataTxt) + 'x|')

$Job | Select-Object $MetaData.PSObject.Properties.Name

“empid” ”name” ”zipcode” ”salary”
------- ------ --------- --------
“1”     ”Tom”  ”11111″   ”1000″
“2”     ”Ann”  ”22222″   ”2000″
iRon
  • 20,463
  • 10
  • 53
  • 79
0

Provided you want to keep those curly quotes and your code page and console font supports all the characters, you can do the following:

# Create array of properties delimited by |
$headers = (Get-Content .\ps_job_metadata.dat -Encoding UTF8) -split '\|'
Import-Csv ps_job.dat -Delimiter '|' -Encoding utf8 | Select-Object $headers
AdminOfThings
  • 23,946
  • 4
  • 17
  • 27