1

I have the below file:

ab=5
ac=6
ad=5
ba=5
bc=7
bd=4
ca=5
cb=7
cd=3
...

"ab" and "ba", "ac" and "ca", "bc" and "cb" are redundant. How do I eliminate these redundant lines in bash ?

Expected output:

ab=5
ac=6
ad=5
bc=7
bd=4
cd=3
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
xavi
  • 109
  • 6
  • you are expected to add your own code/research effort while asking, see https://meta.stackoverflow.com/questions/261592/how-much-research-effort-is-expected-of-stack-overflow-users... though, given interesting question, you've got plenty of answers this time :) – Sundeep Dec 30 '17 at 14:23

4 Answers4

2
$ awk '{x=substr($0,1,1); y=substr($0,2,1)} !seen[x>y?x y:y x]++' file
ab=5
ac=6
ad=5
bc=7
bd=4
cd=3
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
1

Short awk solution:

awk '{ c1=substr($0,1,1); c2=substr($0,2,1) }!a[c1 c2]++ && !((c2 c1) in a)' file
  • c1=substr($0,1,1) - assign the extracted 1st character to variable c1
  • c2=substr($0,2,1) - assign the extracted 2nd character to variable c2
  • !a[c1 c2]++ && !((c2 c1) in a) - crucial condition based on mutual exclusion between "similar" 2-character sequences

The output:

ab=5
ac=6
ad=5
bc=7
bd=4
cd=3
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
1

Here's one with perl, generic solution irrespective of number of characters before =

$ cat ip.txt
ab=5
ac=6
abd=51
ba=5
bad=23
bc=7
bd=4
ca=5
cb=7
cd=3

$ perl -F= -lane 'print if !$seen{join "",sort split//,$F[0]}++' ip.txt
ab=5
ac=6
abd=51
bc=7
bd=4
cd=3
  • like awk, by default uninitialized variables evaluate to false
  • -F= use = as field separator, results saved in @F array
  • $F[0] will give first field, i.e the characters before =
  • split//,$F[0] will give array with individual characters
  • sort by default does string sorting
  • join "" will then form single string from the sorted characters with null string as separator
  • See https://perldoc.perl.org/perlrun.html#Command-Switches for documentation on -lane and -F options. Use -i for inplace editing
Sundeep
  • 23,246
  • 2
  • 28
  • 103
0

Could you please try following and let me know if this helps you, I have written and tested it with GNU awk.

awk -F'=' '{
split($1,array,"")}
!((array[1],array[2]) in a){
  a[array[1],array[2]];
  a[array[2],array[1]];
  print;
  next
}
!((array[2],array[1]) in a){
  a[array[1],array[2]];
  a[array[2],array[1]];
  print;
}
'   Input_file

Output will be as follows.

ab=5
ac=6
ad=5
bc=7
bd=4
cd=3
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • 2
    Using null as a field separator is undefined behavior per POSIX so only some awks (e.g. GNU awk) will split the string into characters, others will do other things. – Ed Morton Dec 30 '17 at 13:19