0

I need to test the integrity of file before importing to SQL. Each row of the file should have the exact same amount of columns.

These are "|" delimited files. I also need to ignore the first line as it is garbage.

If every row does not have the same number of columns, then I need to write an error message.

I have tried using something like the following with no luck:

$colCnt = "c:\datafeeds\filetoimport.txt"
$file = (Get-Content $colCnt -Delimiter "|") 
$file = $file[1..($file.count - 1)]
Foreach($row in $file){
    $row.Count
}

Counting rows is easy. Columns is not. Any suggestions?

GreetRufus
  • 421
  • 2
  • 9
  • 19
  • row delimited or column delimited with "|"? http://ss64.com/ps/get-content.html In this context believe code is stating that the row is delimited by "|" (instead of newline) – Zach Bonham Apr 16 '14 at 20:20

3 Answers3

4

Yep, read the file skipping the first line. For each line split it on the pipe, and count the results. If it isn't the same as the previous throw an error and stops.

$colCnt = "c:\datafeeds\filetoimport.txt"
[int]$LastSplitCount = $Null
Get-Content $colCnt | ?{$_} | Select -Skip 1 | %{if($LastSplitCount -and !($_.split("|").Count -eq $LastSplitCount)){"Process stopped at line number $($_.psobject.Properties.value[5]) for column count mis-match.";break}elseif(!$LastSplitCount){$LastSplitCount = $_.split("|").Count}}

That should do it, and if it finds a bad column count it will stop and output something like:

Process stopped at line number 5 for column count mis-match.

Edit: Added a Where catch to skip blank lines ( ?{$_} )
Edit2: Ok, if you know what the column count should be then this is even easier.

Get-Content $colCnt | ?{$_} | Select -Skip 1 | %{if(!($_.split("|").Count -eq 210)){"Process stopped at line number $($_.psobject.Properties.value[5]), incorrect column count of: $($_.split("|").Count).";break}}

If you want it to return all lines that don't have 210 columns just remove the ;break and let it run.

TheMadTechnician
  • 34,906
  • 3
  • 42
  • 56
  • This works perfectly. Only thing I didn't think of is these files usually have one or two empty rows at the bottom. Is there a way to ignore blank lines and a footer? – GreetRufus Apr 17 '14 at 19:19
  • Updated my answer to take care of blank lines. To be honest mjolinor's answer is sexier than mine, but mine is simple and functional. His has the advantage of catching all lines that don't match. I suppose mine could be modified to do that, but it would get ugly. Both have the flaw of basing the entire thing off the first row. If it is wrong then it all goes to pot for either answer. If you need something that can account for that I can do it, but it'll be a bit more code and probably slower. – TheMadTechnician Apr 17 '14 at 19:26
  • Your correct about using the first row. Didn't think of that. I know what the column count should be (210) so will work on trying to adjust the script to use that variable – GreetRufus Apr 17 '14 at 19:45
  • Updated for exact column count. That makes it considerably easier really, you should have included that in the original post =) – TheMadTechnician Apr 17 '14 at 20:17
  • This is a great post. I have a question. Is there a way to remove a comma from within a DOUBLE QUOTE value where the comma is counted when trying to figure out if I have more than the column count? I'm not sure if anyone will see this post but any help would be appreciated. Thanks. – Melinda Mar 30 '17 at 20:55
  • Hi! I'm pretty sure that's possible, but is really its own question. Please post a new question, and include what you've tried, and what you're working with, and I'm sure somebody here can help you. – TheMadTechnician Mar 30 '17 at 21:13
1

A more generic approach, including a RegEx filter:

$path = "path\to\folder"
$regex = "regex"
$expValue = 450

$files= Get-ChildItem $path | Where-Object {$_.Name -match $regex}
Foreach( $f in $files) {
    $filename = $f.Name
    echo $filename
    $a = Get-Content $f.FullName;
    $i = 1;
    $e = 0;
    echo "Starting...";
    foreach($line in $a)
    {
        if ($line.length -ne $expValue){
            echo $filename
            $a | Measure-Object -Line
            echo "Long:"
            echo $line.Length;
            echo "Line Nº: "
            echo $i;
            $e = $e + 1;       
        }
        $i = $i+1;
    }
    echo "Finished";
    if ($e -ne 0){
        echo $e "errors found";
    }else{
        echo "No errors"
        echo ""
    }
}
echo "All files examined"
Pelayo
  • 114
  • 5
0

Another possibility:

$colCnt = "c:\datafeeds\filetoimport.txt"

$DataLine = (Get-Content $colCnt -TotalCount 2)[1]
$DelimCount = ([char[]]$DataLine -eq '|').count
$MatchString = '.*' + ('|.*' * $DelimCount )

$test = Select-String -Path $colCnt -Pattern $MatchString -NotMatch |
  where { $_.linenumber -ne 1 }

That will find the number of delimiter characters in the second line, and build a regex pattern that can be used with Select-String.

The -NotMatch switch will make it return any lines that don't match that pattern as MatchInfo objects that will have the filename, line number and content of the problem lines.

Edit: Since the first line is "garbage" you probably don't care if it didn't match so I added a filter to the result to drop that out.

mjolinor
  • 66,130
  • 7
  • 114
  • 135