1

New to PowerShell..I'm trying to read a file line by line and regex the information into 2 arrays so I can do more processing using those arrays later.

Each line of my file looks like this, there are thousands of line...

Car   1FMZU77E22UC18440     Honda Civic
Car   SCBCR63W55C024793     GM Colorado 

I need to store VIN number into data1 array and store car model into data2 array. There are always 3 spaces between car column and VIN number column; 5 spaces between VIN number column and car model column. The length of the data varies but the spaces are used as delimiter.

My look between regex for VIN column is (?<=\s\s\s).*?(?=\s\s\s\s\s)

I tried the solution here but not sure how to store it into array - Read file line by line in PowerShell

foreach($line in Get-Content myfile.txt) {
if($line -match $regex){
    $data1 += $line.matches.value
    }
}

My data1 array is empty. How can I do this? Thanks.

Edit: there's no space after 3rd column. My regex for data2 array would be look behind (?<=\s\s\s\s\s). Not sure if it works since I can't store my data yet. I know my regex is from c#...not sure if it applies to PowerShell.

DL72
  • 127
  • 8
  • Ok how many spaces between the rest? You really aren't give us much to go off of. – Doug Maurer Sep 22 '20 at 02:03
  • You mean after the 3rd column? There is no space after the 3rd column. For each line, there's 3 spaces between 1111 (always fixed length) and xxxx (fixed length data); 5 spaces between xxxx and yyyy (yyyy is not fixed length data). I hope that clears up. – DL72 Sep 22 '20 at 02:11
  • So the 2222 and zzzzz and wwwww are variable length without separators? – Doug Maurer Sep 22 '20 at 02:48
  • PLEASE, add a realistic sample of the data _and wrap it in code formatting_ so that it can be easily read and copied for testing. – Lee_Dailey Sep 22 '20 at 04:00

1 Answers1

2

Without some real (mock) data, I don't think it's best to use regex. Besides, this can be simplified greatly by treating it as a CSV.

First for this test and so others can try it as well we will make a sample file

$tempfile = New-TemporaryFile

$data = @'
1111   xxxx     yyyyyyyyyyyyyyyy 2222 zzzz wwwwwww
3333   aaaa     bbbbbbbbbbbbbbbb 4444 cccc ddddddd
'@ | Set-Content $tempfile -Encoding utf8

Next, we read the info while replacing spaces with commas. If your data has spaces, then of course this would need adjustment or not be used, depending.

(Get-Content $tempfile) -replace '\s+',','

1111,xxxx,yyyyyyyyyyyyyyyy,2222,zzzz,wwwwwww
3333,aaaa,bbbbbbbbbbbbbbbb,4444,cccc,ddddddd

Now we know our data is proper, import as CSV while specifying headers. You could provide meaningful headers since you know what the info is.

(Get-Content $tempfile) -replace '\s+',',' | ConvertFrom-Csv -Header a,b,c,d,e,f

a : 1111
b : xxxx
c : yyyyyyyyyyyyyyyy
d : 2222
e : zzzz
f : wwwwwww

a : 3333
b : aaaa
c : bbbbbbbbbbbbbbbb
d : 4444
e : cccc
f : ddddddd

If you only want certain columns, simply select only those.

(Get-Content $tempfile) -replace '\s+',',' | 
    ConvertFrom-Csv -Header a,b,c,d,e,f | Select b,c,e,f -OutVariale data

b    c                e    f      
-    -                -    -      
xxxx yyyyyyyyyyyyyyyy zzzz wwwwwww
aaaa bbbbbbbbbbbbbbbb cccc ddddddd

Now assign it accordingly

$data | select b,e -OutVariable data1

b    e   
-    -   
xxxx zzzz
aaaa cccc

$data | select c,f -OutVariable data2

c                f      
-                -      
yyyyyyyyyyyyyyyy wwwwwww
bbbbbbbbbbbbbbbb ddddddd

Edit

Your meaningful data changes my recommendation. Also curious where the extra columns are as it's not like what you showed initially. Based on the data you've shown, I have three different options.

Again, starting with some sample data

$tempfile = New-TemporaryFile

@'
Car   1FMZU77E22UC18440     Honda CiviC
Car   SCBCR63W55C024793     GM Colorado
Truck   SZXYR63W55C165487     GM some model
'@ | Set-Content $tempfile -Encoding utf8

Option 1 - ConvertFrom-String

ConvertFrom-String can parse the data based off a template you provide as "training" data.

$template = @'
{Type*:abc}   {VIN:ABCDEFGH123456789}     {Model:some model}
{Type*:abcde}   {VIN:sample}     {Model:a some model}
'@

Now we apply the template. One aspect of this that makes it better than regex line by line, is you can use the fast -Raw parameter of get content which is very fast.

get-content $tempfile -Raw |
    ConvertFrom-String -TemplateContent $template -OutVariable data

Type  VIN               Model        
----  ---               -----        
Car   1FMZU77E22UC18440 Honda CiviC  
Car   SCBCR63W55C024793 GM Colorado  
Truck SZXYR63W55C165487 GM some model

If you don't need the type, just ignore it. Set it to your variables like

$data1,$data2 = $data.vin,$data.model

Contents of the variables $data1 and $data2

$data2
Honda CiviC
GM Colorado
GM some model

$data1
1FMZU77E22UC18440
SCBCR63W55C024793
SZXYR63W55C165487

Option 2 - Regex Get-Content / line by line

Get-Content $tempfile | foreach {
    if($_ -match '(?<=[\s]{3})(\w*)[\s]{5}(.*)$')
    {
        [PSCustomObject]@{
            Model = $matches.2
            VIN   = $matches.1
        }
    }
}

once again set the variables as you desire

$data1,$data2 = $data.vin,$data.model

Option 3 - Select-String (probably closer to Option 1 speed)

Select-String -path $tempfile -Pattern '(?<=[\s]{3})(\w*)[\s]{5}(.+)' -AllMatches | ForEach {
    $_.Matches | foreach {
        [PSCustomObject]@{
            Model = $_.groups[2].value
            VIN   = $_.groups[1].value
        }
    }
} -OutVariable data
Doug Maurer
  • 8,090
  • 3
  • 12
  • 13