0

Really need help on this :(I 'll try to be as simple as possible.

I got one big file looking like this:

ID,Info1,Info2,info3,...

On each line, i got one ID and a lot of stuff, comma separated. There can be > 3000 lines.

Now i got a second file like this :

ID,Info4,Info5,Info6,...

The first file contains ALL the elements whereas the second file contains only some of them.

For example, first one:

BLA1,some stuff...
BLA2,some stuff...
BLA3,some stuff...
ALO1,some stuff...
ALO2,some stuff...

And the second one :ยจ

BLA3,some stuff2... 
ALO1,some stuff2...
BLA1,some stuff2... 

What i want is simple, I want to append all the 'some stuff2...' of the second file to the first one like a join type=left with sql

I want the first file to have now :

BLA1,some stuff...,some stuff2...
BLA2,some stuff...
BLA3,some stuff...,some stuff2...
ALO1,some stuff...,some stuff2...
ALO2,some stuff...

I tried something like this :

ForEach ($line in $file1) {
    $colA = $line.Split(',')
    ForEach ($line in $file2) {
        $colB = $line.Split(',')
        if($colA[0]-eq $colB[0]) { #Item found in file2
            $out += $date + $colA[1]+","+ ... +","+ $colB[1]+","+ ... +"`n"
        }else { 
            $out += $date + $colA[1]+","+ ... +"`n"
        }
    }
}

But it takes so much time it dosnt success (and maybe there were other problems i didnt see). What's the best way? a 2D Array? I could try to sort the IDs and then script a little, but as its not numerical only i don't know how to process.

Thks a lot guys for your help,

timmalos
  • 532
  • 3
  • 9
  • 24

3 Answers3

2

Use a hashtable where the key is the ID.

$ht = [ordered]@{}
foreach ($line in $file1) {
    $id,$rest = $line -split ',',2
    $ht[$id] = $line
}
foreach ($line in $file2) {
    $id,$rest = $line -split ',',2
    if ($ht.ContainsKey($id)) {
        $ht[$id] += ",$rest"
    }
    else {
        $ht[$id] = $line
    }
}
$ht.Values > newfile.txt
Keith Hill
  • 194,368
  • 42
  • 353
  • 369
  • Ah if I don't use $rest, the ID is now an array. Good point, I'll update the script. Thanks. โ€“ Keith Hill Aug 30 '13 at 00:10
  • I must use Powershell v2. So i removed the [ordered] which i think is not necessary?, and i need to find a way to write the hastable cause $ht.Values returns System.Collections.Hashtable+ValueCollection โ€“ timmalos Aug 30 '13 at 07:07
  • 1
    $ht.Values is another V3-ism. Try `$ht.GetEnumerator() | Foreach {$_.Value}`. โ€“ Keith Hill Aug 30 '13 at 15:29
1

I went with the assumption that you either have known header lines or can add them...

f1.csv

Name,Item_1
BLA1,thing_bla1_1
ALB1,thing_alb1_1
BLA2,thing_bla2_1
ALB2,thing_alb2_1
BLA3,thing_bla3_1
ALB3,thing_alb3_1

f2.csv

Name,Item_2
BLA3,thing_bla3_2
ALB3,thing_alb3_2
BLA1,thing_bla1_2
ALB1,thing_alb1_2
BLA2,thing_bla2_2
ALB2,thing_alb2_2

Code:

$grouped = Import-Csv .\f1.csv, .\f2.csv | group -property Name -ashashtable

$($grouped.Keys | foreach {$obj = $grouped.Item("$_")[0].Name + "," + $grouped.Item("$_")[0].Item_1 + "," + $grouped.Item("$_")[1].Item_2; $obj}) | Out-File .\test.csv

What we are doing here is importing the two CSVs into one element, then grouping the items of the same name in the hash table. Then we pipe the keys (the non-duplicated names from the files) into a foreach that combines them into one line. We need the $() around those statements to allow the output to be piped to Out-File.

I'm nearly positive that there is a cleaner way to do the inside of the foreach, but this does work.

The output (text.csv):

ALB1,thing_alb1_1,thing_alb1_2
BLA2,thing_bla2_1,thing_bla2_2
ALB3,thing_alb3_1,thing_alb3_2
BLA1,thing_bla1_1,thing_bla1_2
ALB2,thing_alb2_1,thing_alb2_2
BLA3,thing_bla3_1,thing_bla3_2
philselmer
  • 751
  • 4
  • 22
0

If you want to do a LEFT JOIN, you could load the files into a temporary database and actually do a LEFT JOIN. See here for an example using SQLite.

Community
  • 1
  • 1
Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328