1

I have a file, that is sorted by first column.

10,W,A
20,W,E
30,I,W
40,A,E
50,P,E
60,S,A
70,A,P
80,A,I
100,A,S
110,I,S
120,A,N
130,E,N

I need to get all the first columns together until when the third column doesn't appear in the second column. If my third column value has already appeared in second column, it should be ignored.

My attempt to bring it using awk is as follows

$ awk -F"," ' { f[$2]++; if( !f[$3] ) { d[$3]=$1 }  f[$3]++ } END { for(i in d) print i, d[i] } ' cg.txt
N 120
A 10
E 20

what I'm expecting is

N 120, 130
A 10
E 20, 40, 50
Inian
  • 80,270
  • 14
  • 142
  • 161
stack0114106
  • 8,534
  • 3
  • 13
  • 38

5 Answers5

3
perl -F, -lane'
   ++$seen{ $F[1] };
   push @{ $groups{ $F[2] } }, $F[0] if !$seen{ $F[2] };
   END {
      local $" = ", ";
      print "$_ @{ $groups{$_} }" for sort keys %groups;
   }
'
  • -F, -a causes the input line to be split at commas into @F.
  • We keep track of what values we've seen in the second column using %seen.
  • If it's a value we haven't seen, add it to the %groups, a hash of arrays.
  • And the end, print out the hash of arrays. $" is used so the array elements are separated by ,␠ instead of the default .

Specifying file to process to Perl one-liner

ikegami
  • 367,544
  • 15
  • 269
  • 518
1

Could you please try following. Considering only condition print all of 3rd column values till its first occurrences in 2nd column(tested with samples provided only)

awk '
BEGIN{
  FS=","
  OFS=", "
}
{
  a[$3]=(a[$3]?a[$3] OFS:"")$1
}
{
  ++c[$2]
}
($2 in a) && c[$2]==1{
  print $2 " " a[$2]
}
END{
 for(i in a){
     if(!(i in c)){
         print i" " a[i]
     }
 }
}'  Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
1

another awk

$ awk -F, '{a[$2]; k=$3} 
       !(k in a) {b[k]=b[k] s[k] $1; s[k]=FS} 
       END       {for(k in b) print k, b[k]}' file

N 120,130
A 10
E 20,40,50
karakfa
  • 66,216
  • 7
  • 41
  • 56
1

Perl code for your data

use strict;
use warnings;

my %seen;
my %data;

while( <DATA> ) {
    chomp;
    my @a = split ',';

    push @{$data{$a[2]}}, $a[0] if not $seen{$a[2]};

    $seen{$a[1]} = 1;
}

while( my($k,$v) = each %data ) {
    printf "%s %s\n", $k, join ", ", @$v;
}

__DATA__
10,W,A
20,W,E
30,I,W
40,A,E
50,P,E
60,S,A
70,A,P
80,A,I
100,A,S
110,I,S
120,A,N
130,E,N
Polar Bear
  • 6,762
  • 1
  • 5
  • 12
0

Added as answer from comment as requested by OP.

Just remove f[$3]++ and change d[$3]=$1 with if (d[$3]) { d[$3]=d[$3] ", " $1 } else { d[$3]= $1}.

Nahuel Fouilleul
  • 18,726
  • 2
  • 31
  • 36