I have the below file:
ab=5
ac=6
ad=5
ba=5
bc=7
bd=4
ca=5
cb=7
cd=3
...
"ab" and "ba", "ac" and "ca", "bc" and "cb" are redundant. How do I eliminate these redundant lines in bash ?
Expected output:
ab=5
ac=6
ad=5
bc=7
bd=4
cd=3
I have the below file:
ab=5
ac=6
ad=5
ba=5
bc=7
bd=4
ca=5
cb=7
cd=3
...
"ab" and "ba", "ac" and "ca", "bc" and "cb" are redundant. How do I eliminate these redundant lines in bash ?
Expected output:
ab=5
ac=6
ad=5
bc=7
bd=4
cd=3
$ awk '{x=substr($0,1,1); y=substr($0,2,1)} !seen[x>y?x y:y x]++' file
ab=5
ac=6
ad=5
bc=7
bd=4
cd=3
Short awk
solution:
awk '{ c1=substr($0,1,1); c2=substr($0,2,1) }!a[c1 c2]++ && !((c2 c1) in a)' file
c1=substr($0,1,1)
- assign the extracted 1st character to variable c1
c2=substr($0,2,1)
- assign the extracted 2nd character to variable c2
!a[c1 c2]++ && !((c2 c1) in a)
- crucial condition based on mutual exclusion between "similar" 2-character sequencesThe output:
ab=5
ac=6
ad=5
bc=7
bd=4
cd=3
Here's one with perl
, generic solution irrespective of number of characters before =
$ cat ip.txt
ab=5
ac=6
abd=51
ba=5
bad=23
bc=7
bd=4
ca=5
cb=7
cd=3
$ perl -F= -lane 'print if !$seen{join "",sort split//,$F[0]}++' ip.txt
ab=5
ac=6
abd=51
bc=7
bd=4
cd=3
awk
, by default uninitialized variables evaluate to false
-F=
use =
as field separator, results saved in @F
array$F[0]
will give first field, i.e the characters before =
split//,$F[0]
will give array with individual characterssort
by default does string sortingjoin ""
will then form single string from the sorted characters with null string as separator-lane
and -F
options. Use -i
for inplace editingCould you please try following and let me know if this helps you, I have written and tested it with GNU awk
.
awk -F'=' '{
split($1,array,"")}
!((array[1],array[2]) in a){
a[array[1],array[2]];
a[array[2],array[1]];
print;
next
}
!((array[2],array[1]) in a){
a[array[1],array[2]];
a[array[2],array[1]];
print;
}
' Input_file
Output will be as follows.
ab=5
ac=6
ad=5
bc=7
bd=4
cd=3