3

I cannot fix a problem with a text file since I am a beginner with linux command and/or bash script.

I have a text file like this:

object1 10.603  0.757
object1 10.523  0.752
object1 10.523  0.752
object1 10.456  0.747
object1 10.456  0.747
object1 10.271  0.734
object2 11.473  0.194
object2 11.460  0.194
object2 11.445  0.191
object2 11.421  0.190
object3 9.272   0.12
object3 9.236   0.12
object3 8.814   0.119
object3 0.968   0.119
object3 10.959  0.119

and i have to do on this file a particular operation of cutting and sorting: for every string that contains the words "object1", "object2" and so on, i want to print only the string having the highest values according to the third column; then i want to sort the output of this operation according to the values of the third column.

The output, for sake of clarity, should be like this:

object1 10.603  0.757
object2 11.473  0.194
object3 9.272   0.12

Any suggestion for the linux command to be used and/or a bash script?

thanks to everyone

Dudi Boy
  • 4,551
  • 1
  • 15
  • 30

4 Answers4

2

Using sort and awk:

sort -k1,1 -k3rn -k2rn file.txt | awk '!seen[$1] {print} {seen[$1]++}'

sort first sorts the first field, then the third in reverse, then the second in reverse (this latter may be omitted if it doesn't matter). Then awk only prints the first unique lines found considering only the first field.

gustgr
  • 152
  • 9
1

One in awk:

$ awk '{
    if(m[$1]<$3) {   # if previous max for 1st field val is bigger
        m[$1]=$3     # replace max value
        r[$1]=$0     # store record
    }
}
END {                # in the end
    for(i in r)      # iterate hashed records
        print r[i]   # and output
}' file

Output (in no particular order, if sorting needed, use sort or GNU awk with PROCINFO["sorted_in"]="@ind_str_asc" in the beginning of the END{} block):

object1 10.603  0.757
object2 11.473  0.194
object3 9.272   0.12

Update:

Another using sort and uniq, shuf only for demonstration:

$ sort -k1r -k3n <(shuf file) | uniq -w 7
object3 9.272   0.12
object2 11.473  0.194
object1 10.603  0.757

For grouping the first field, I used: (man uniq):

-w, --check-chars=N
      compare no more than N characters in lines
James Brown
  • 36,089
  • 7
  • 43
  • 59
1

Here is another awk script that do the job.

script.awk

$1 == currObj{    # for each reoccouring object
    if ( ($3 + 0) > maxArr[$1] ) maxArr[$1] = $3 + 0;  # identify the max and store in maxArr
    next;         # skip to read next line
}
{                 # for each line having new object
    currObj = $1; # store current object in 1st field into variable currObj
    maxArr[$1] = $3; # reset the maxArr to current value
    fld2Arr[$1] = $2; # store 2nd field into an array;
}
END {             # post processing
    for (i in maxArr) print i, fld2Arr[i], maxArr[i]; # print for each index the array values
}

running:

awk -f script.awk input.txt

output:

object1 10.603 0.757
object2 11.473 0.194
object3 9.272 0.12
Dudi Boy
  • 4,551
  • 1
  • 15
  • 30
0

Use awk to filter the data before you sort it.

awk 'a[$1] < $3 {a[$1] = $3; b[$1]=$0} END {for (i in a) print b[i]}' input | sort -k3rn
William Pursell
  • 204,365
  • 48
  • 270
  • 300