0

I have many files, each in a directory. My script should:

  • Find a string in a file. Let's say the file is called "results" and the string is "average."

  • Then append everything else on the string's line to another file called "allResults." After running the script, the file "allResults" should contain as many lines as there are "results" files, like

allResults.txt (what I want):

Everything on the same line as the string, "average" in directory1/results
Everything on the same line as the string, "average" in directory2/results
Everything on the same line as the string, "average" in directory3/results
...
Everything on the same line as the string, "average" in directory-i/results

My script can find what I need. I have checked by doing a "cat" on "allResults.txt" as the script is working and an "ls -l" on the parent directory of "allResults.txt." I.e., I can see the output of the "find" on my screen and the size of "allResults.txt" increases briefly, then goes back to 0. The problem is that "allResults.txt" is empty when the script has finished. So the results of the "find" are not being appended/added to "allResults.txt." They're being overwritten. Here is my script (I use "gsed", GNU sed, because I'm a Mac OSX Sierra user):

#!/bin/bash

# Loop over all directories, find.
let allsteps=100000
for ((step=0; step <= allsteps; step++)); do
    i=$((step));

    findme="average"
    find ${i}/experiment-1/results.dat -type f -exec gsed -n -i "s/${findme}//p" {} \; >> allResults.txt
done 

Please note that I have used ">>" in my example here because I read that it appends (which is what I want--a list of all lines matching my "find" from all files), whereas ">" overwrites. However, in both cases (when I use ">" or ">>"), I end up with an empty allResults.txt file.

Ant
  • 753
  • 1
  • 9
  • 24
  • Have a look at this post: https://stackoverflow.com/questions/15030563/redirecting-stdout-with-find-exec-and-without-creating-new-shell – GoinOff Jul 24 '19 at 16:19
  • @GoinOff, thanks. I did take a look and tried to implement Philipp Jone's solution. No luck. I don't really understand his syntax (how to modify my code accordingly). – Ant Jul 24 '19 at 16:45

1 Answers1

1

grep's default behavior is to print out matching lines. Using sed is overkill.

You also don't need an explicit loop. Indeed, excess looping is a common trope programmers tend to import from other languages where looping is common. Most shell commands and constructs accept multiple file names.

grep average */experiment-1/results.dat > allResults.txt

What's nice about this is the output file is only opened once and is written to in one fell swoop.

If you indeed have hundreds of thousands of files to process you might encounter a command-line length limit. If that happens you can switch to a find call which will make sure not to call grep with too many files at once.

find . -name results.dat -exec grep average {} + > allResults.txt
John Kugelman
  • 349,597
  • 67
  • 533
  • 578
  • thank you. It will take me a while to digest this. I will try it. Yes, I'm used to C++ and find it hard not to deal with loops. – Ant Jul 24 '19 at 16:29
  • thank you--I tried your suggestions. The first one leads to grep: */i/experiment-1/results.dat: No such file or directory. The second one, unfortunately, works with no errors, but "allResults.txt" is empty. – Ant Jul 24 '19 at 18:13
  • Run `find -name results.dat` to print out all files it finds. Is it finding any? – John Kugelman Jul 24 '19 at 18:30
  • I did it, John. The result is "find: illegal option -- n." Interesting! – Ant Jul 24 '19 at 21:52
  • Oh, a Mac. I missed that. There needs to be a `.` in there. See my updated answer. Try `find . -name results.dat` to see if any files are found. – John Kugelman Jul 24 '19 at 22:24
  • Thank you, John. Phew. Yes, that gave me a complete list of "./${i}/experiment-1/results.dat." I am relieved. Last night I got into reading about the different version of "grep" for Mac available via a Homebrew download. I got it and still had no success and was beginning to wonder what on Earth was wrong. – Ant Jul 25 '19 at 07:46
  • I think I will try a loop... The numbers are ending up in the wrong order in my text file. I'd like (for example) the number from directory 1 to be at the top, followed by directory 2. So I'll try a loop and see if that works. At any rate, thank you very much. I think this will work after I experiment a little. – Ant Jul 25 '19 at 08:32
  • Thanks so much, John. It took a while, but it looks like everything I need has been printed to the file via your "find" solution. I wish there were a way to send people beverage vouchers here. Really appreciate your help. Like I said, the numbers are in the wrong order in the file, but the path/directory information has been printed to the file, so I will look for a way to sort them. I'm just happy to have them at all. Thank you so much. – Ant Jul 25 '19 at 08:58
  • 1
    Play around with `sort`. It has lots of flags. `find ... | sort > allResults.txt`. – John Kugelman Jul 25 '19 at 12:07