Bash: find and path

Question

As an extension of this question , I would now like to have not only the filename, but the directories up to k positions back. Here's the problem:

I have directories named RUN1, RUN2, and RUN3 Each directory has some files. Directory RUN1 has files mod1_1.csv, mod1_2.csv, mod1_3.csv. Directory RUN2 has files mod2_1.csv, mod2_2.csv, mod3_3.csv, etc.

The contents of mod1_1.csv file look like this:

5.71 6.66 5.52 6.90
5.78 6.69 5.55 6.98
5.77 6.63 5.73 6.91

And mod1_2.csv looks like this:

5.73 6.43 5.76 6.57
5.79 6.20 5.10 7.01
5.71 6.21 5.34 6.81

In RUN2, mod2_1.csv looks like this:

5.72 6.29 5.39 5.59
5.71 6.10 5.10 7.34
5.70 6.23 5.23 6.45

And mod2_2.csv looks like this:

5.72 6.29 5.39 5.69
5.71 6.10 5.10 7.32
5.70 6.23 5.23 6.21

My goal is to obtain the line with the smallest value of column 4 for each RUN* directory, and write that and the model which gave it and part of the path to a new .csv file. Right now, I have this code:

#!/bin/bash
resultfile="best_results_mlp_onelayer.txt"
for d in $(find . -type d -name 'RUN*' | sort);
do
 find "$d" -type f -name 'mod*' -exec awk '{print $0, FILENAME}' {} \;|sort -k4 -g |head -1 >> "$resultfile"
done

This gives me:

5.73 6.43 5.76 6.57 ./RUN_1/mod1_2.csv
5.72 6.29 5.39 5.59 ./RUN_2/mod2_1.csv

But I would like a .csv file with these contents:

5.73 6.43 5.76 6.57 ./DIR1/DIR2/DIR3/RUN_1/mod1_2.csv
5.72 6.29 5.39 5.59 ./DIR1/DIR2/DIR3/RUN_2/mod2_1.csv

where my pwd is /DIRk/DIRm/DIRl/DIR1/DIR2/DIR3

EDIT:

Based on a reply, what I mean by 'k positions back' is:

Right now, my code gives me ./RUN_1/mod1_2.csv as the last column value in the first row. To me, that is a pwd 'one position back', because it shows the directory where the file mod1_2.csv is located. I would like the path '4 positions back'. That is, I would like ./DIR1/DIR2/DIR3/RUN_1/mod1_2.csv. I said 'k' because that's a common placeholder, and I was hoping I could just substitute a number in there.

score 1 · Answer 1 · answered Mar 14 '17 at 20:33

Following dgeorgiev's answer, I placed my results gathering code in a directory further up in the hierarchy. So, continuing from my question, if my pwd is /DIRk/DIRm/DIRl/DIR1/DIR2/DIR3, I moved my .sh file to DIRk/DIRm/DIRl. Then I ran this:

#!/bin/bash
resultfile="best_results_mlp.txt"

for d in $(find . -type d -name 'RUN*' | sort);
do
   find "$d" -type f -name 'mod*' -exec awk '{print $0, FILENAME}' {} \;|sort -k4 -g |head -1 >> "$resultfile"

done

And the result was, as desired:

5.73 6.43 5.76 6.57 ./DIR1/DIR2/DIR3/RUN_1/mod1_2.csv
5.72 6.29 5.39 5.59 ./DIR1/DIR2/DIR3/RUN_2/mod2_1.csv

score 0 · Answer 2 · answered Mar 13 '17 at 20:41

I don't see any commas in these CSVs. I assume you're just separating by whitespace. And since you're already using awk in your find line, I guess we can assume that you're open to awk-based options.

$ find . -type f
./RUN1/mod1_1
./RUN1/mod1_2
./RUN2/mod2_1
./RUN2/mod2_2
$ awk 'NR == 1 {n=$4} $4 > n {n=$4; f=FILENAME} END {print f,n}' RUN*/mod*
RUN2/mod2_1 7.34

This uses the awk built-in variable FILENAME which always contains the name of the current file.

I can't tell from your question what you mean by "k positions back", but you can strip or parse this output however you see fit.

Thank you @ghoti, but how could I integrate this into what I already have? — StatsSorceress, Mar 13 '17 at 21:01

dgeorgiev · Accepted Answer · 2017-03-14T10:22:28.920

Further to my answer in your previous question:

find passes the file with the path under which it was found. So if you search in "/path/to/$d", you will get "/path/to/$d/filename.csv". Just make find search in the path you would like to get.

So if your RUN* dirs are located in /path/to/, and you would like to have ./to/RUNx/filename.csv in your results, you can always do

cd /path/ && find ./to/RUNx/ # ...

If you need the absolute path, you can just run find on /path/to/RUNx

Just be careful when changing directories, and make sure to change back to where you have to if necessary. (eg. You might have to provide the path to your output file)

What solved this was actually a combination of your answers. I'll post it as a separate answer below. Thank you for your help! — StatsSorceress, Mar 14 '17 at 20:29

jil · Answer 4 · 2017-03-13T21:38:05.020

-1

How about something like this

find . -type d -name 'RUN*' | while read -r dir; do
    awk '{print $0, FILENAME}' "$dir"/mod* \
    | sort -k4 -g | head -1
done

(sorry about my original misinterpretation of your requirements, edited to correct the issue)

edited Mar 13 '17 at 21:38

answered Mar 13 '17 at 20:29

jil

2,601
12
14

Thank you @jil, but that does not solve the problem. I still need the directories. – StatsSorceress Mar 13 '17 at 20:37

Bash: find and path

4 Answers4