2

I have a set of directories:

RUN1 RUN2 RUN3

Within each those directories, I have files. RUN1 has:

mod1_1 mod1_2 mod1_3

and RUN2 has:

mod2_1 mod2_2 mod2_3

etc.

Each file has lines like this (this is mod1_1):

8.69e-01 2.59e-01 7.82e-01 4.92e-01
8.69e-01 2.56e-01 7.84e-01 4.95e-01
8.72e-01 2.54e-01 7.83e-01 5.00e-01
8.71e-01 2.53e-01 7.84e-01 5.01e-01
8.73e-01 2.53e-01 7.81e-01 4.99e-01

And this is mod1_2:

8.69e-01 2.59e-01 7.82e-01 4.98e-01
8.69e-01 2.56e-01 7.84e-01 4.90e-01
8.72e-01 2.54e-01 7.83e-01 5.00e-01
8.71e-01 2.53e-01 7.84e-01 5.01e-01
8.73e-01 2.53e-01 7.81e-01 4.99e-01

I want to create a new file that contains only the smallest number in column 4 for each mod file. For example, suppose mod1_1 and mod2_1 are the only files. I want to create a new file that contains line 1 from mod1_1 and line 2 from mod2_1:

8.69e-01 2.59e-01 7.82e-01 4.92e-01  
8.69e-01 2.56e-01 7.84e-01 4.90e-01

I would like to do this for each RUN directory. I have tried this:

#/bin/bash

finddir=$(find -type d -name 'RUN*' | sort) #find the dirs
for i in $finddir; do
        cd $i
        echo $(pwd)
        findfiles=$(find -type f -name 'mod*' | sort -V) #find the files
        echo $findfiles
        for j in $findfiles; do
                s1=$(sort -k3,3 j)
                echo $s1
done

My problem is the sort command, and I don't know how to write the results to a file. Any ideas?

Pseudocode in case it's helpful:

For each directory RUN*
    For each file mod*
        get the minimum value in column 4, save the line that has that value
    End for 
    Write the lines that had the minimum values to a new file
End for

EDIT: Still having issues. Here's how I've modified:

#/bin/bash

finddir=$(find -type d -name 'RUN*' | sort) #find the dirs
for i in $finddir; do
        cd $i
        echo $(pwd)
        findfiles=$(find -type f -name 'mod*' | sort -V) #find the files
        for j in $findfiles; do
                s1=$(sort -k 4 -g $j)
                echo -n "$s1"
        done
cd ..
done

I was 'cd'ing in the wrong part. This is a bit better - it gives me the four numbers on each line - but it's not returning only the line with the smallest value of column 4 from each file. Also, I still don't know how to export the final results to a new file.

StatsSorceress
  • 3,019
  • 7
  • 41
  • 82

2 Answers2

1

for each of these files 1_1 or 1_2, following command should give you the row that has lowest number in the 4th column in that file:

~]$ cat 1_2
8.69e-01 2.59e-01 7.82e-01 4.98e-01
8.69e-01 2.56e-01 7.84e-01 4.90e-01
8.72e-01 2.54e-01 7.83e-01 5.00e-01
8.71e-01 2.53e-01 7.84e-01 5.01e-01
8.73e-01 2.53e-01 7.81e-01 4.99e-01

Now use sort -k

~]$ sort -k 4 test | head -1
8.69e-01 2.56e-01 7.84e-01 4.90e-01

Without head -1 you should see they are sorted according to the 4th column:

]$ sort -k 4 1_2
8.69e-01 2.56e-01 7.84e-01 4.90e-01
8.69e-01 2.59e-01 7.82e-01 4.98e-01
8.73e-01 2.53e-01 7.81e-01 4.99e-01
8.72e-01 2.54e-01 7.83e-01 5.00e-01
8.71e-01 2.53e-01 7.84e-01 5.01e-01

EDIT

#!/bin/bash
resultfile="somefile.txt"
for d in $(find . -type d -name 'RUN*');
do
  find $d -type f -name 'mod*' -exec sort -k4 -g {} \; | head -1 >> "$resultfile"
done
iamauser
  • 11,119
  • 5
  • 34
  • 52
  • And how does that work with many numbered directories, not just 2? – StatsSorceress Mar 06 '17 at 20:00
  • I get a lot of this error : `./testagain.sh: line 5: : No such file or directory find: sort terminated by signal 13` – StatsSorceress Mar 06 '17 at 20:21
  • You are not defining "$resultfile" which is causing those errors. See my edit one more time. – iamauser Mar 06 '17 at 20:38
  • Mostly there, but still getting `find: 'sort' terminated by signal 13`, and the resulting file has a line from a file I wasn't expecting. – StatsSorceress Mar 06 '17 at 20:42
  • Okay, I still have that error but the solution is to change one line: `find $d -type f -name 'mod*' -exec sort -k4 -g {} \; | head -1 >> "$resultfile"` If you write that up I'll accept your answer! – StatsSorceress Mar 06 '17 at 20:45
  • The proposed solution in the *EDIT* of the accepted answer sorts the files one by one, which means that the smallest value of the first sorted file will be returned, instead of the smallest for all files. If the goal is to return the smallest of all files of a given group, those files have to be sorted all at once (eg. using `{} +`) I've added more details here: http://stackoverflow.com/a/42720248/4535717 – dgeorgiev Mar 14 '17 at 11:20
1

There is a couple of problems: 1.) use $j instead of j in the sort command 2.) quote your variables on echo (see How do I preserve line breaks when storing a command output to a variable in bash? for details) 3.) you cd into a directory but never go back... you probably want to go back ...

I tested a simpler version of your code and (not going into directories) and that works:

#!/bin/bash

findfiles=$(find -type f -name 'mod*' | sort -V) #find the files
for j in $findfiles; do
       echo $j
       s1=$(sort -k 4 -g $j)
       echo "$s1"
 done

Note, that I used sort -g so floating point values are handled properly, e.g. if you change your data to (using 4.95e-02 instead of 4.95e-01 in the second row):

8.69e-01 2.59e-01 7.82e-01 4.92e-01
8.69e-01 2.56e-01 7.84e-01 4.95e-02
8.73e-01 2.53e-01 7.81e-01 4.99e-01
8.72e-01 2.54e-01 7.83e-01 5.00e-01
8.71e-01 2.53e-01 7.84e-01 5.01e-01

then without -g the order will be wrong:

 $ cat test.dat | sort -k 4
 8.69e-01 2.59e-01 7.82e-01 4.92e-01
 8.69e-01 2.56e-01 7.84e-01 4.95e-02
 8.73e-01 2.53e-01 7.81e-01 4.99e-01
 8.72e-01 2.54e-01 7.83e-01 5.00e-01
 8.71e-01 2.53e-01 7.84e-01 5.01e-01

using -g instead, order will handle the exponent correct:

$ cat test.dat | sort -k 4 -g
8.69e-01 2.56e-01 7.84e-01 4.95e-02
8.69e-01 2.59e-01 7.82e-01 4.92e-01
8.73e-01 2.53e-01 7.81e-01 4.99e-01
8.72e-01 2.54e-01 7.83e-01 5.00e-01
8.71e-01 2.53e-01 7.84e-01 5.01e-01
Community
  • 1
  • 1
andipla
  • 363
  • 4
  • 9