I am working with a file that contains 3 values, an ID (they happen to be protein ids in case you are curious), a value, and then another value. It is tab delimited, so it looks like this:
A2M 0.979569315988908 1
AACS 0.925340159491081 1
AAGAB 0.982296215686199 1
AAK1 0.736903840140103 1
AAMP 0.00589711816127862 0.138868449447202
AARS2 1 1
AARS 3.13300124295614e-05 0.00212792325492566
AARSD1 0.527417792161261 1
AASDH 0.869909252023668 1
AASDHPPT 0.763918221284724 1
AATF 0.691907759125663 1
ABAT 0.989693691462661 1
ABCA1 0.601194017450064 1
ABCA5 1 1
ABCA6 1 1
I am interested in sorting these IDs in alphabetical order and extracting various values. However, I noticed that sort sorts the IDs differently, depending on what I am extracting. When I execute:
cut --fields\=1,2 input.txt|sort --key=1
The resulting file is:
A2M 0.979569315988908
AACS 0.925340159491081
AAGAB 0.982296215686199
AAK1 0.736903840140103
AAMP 0.00589711816127862
AARS2 1
AARS 3.13300124295614e-05
AARSD1 0.527417792161261
AASDH 0.869909252023668
AASDHPPT 0.763918221284724
AATF 0.691907759125663
ABAT 0.989693691462661
ABCA1 0.601194017450064
ABCA5 1
ABCA6 1
BUT When I execute:
cut --fields\=1,3 input.txt|sort --key=1
I get
A2M 1
AACS 1
AAGAB 1
AAK1 1
AAMP 0.138868449447202
AARS 0.00212792325492566
AARS2 1
AARSD1 1
AASDH 1
AASDHPPT 1
AATF 1
ABAT 1
ABCA1 1
ABCA5 1
ABCA6 1
Notice that the positions of AARS and AARS2 are switched, which they shouldn't be since I am just sorting based on the first column. I've never seen any behavior like this from sort, and I've been using bash for a while now. Is this a bug, or am I doing something wrong?