2

I have 1000 text files and each file is tab delimited with following format

John    32     NY     12     USA
Peter   78.    CA.    8.     USA
Stef.   67.    CA.    12.    USA

I want to extract all those line where fourth column is exactly 12. This is what I've done:


file='random'

FILES=/home/user/data/*.txt
for f in $FILES; 
do 
echo $f
filename=$(basename $f)
awk -F"\t" '$4 == 12' $f >  /home/user/extra/$file/$filename; 
done

But this produces empty files and I am not sure what I am doing wrong here. Insights will be appreciated.

John
  • 815
  • 11
  • 31

1 Answers1

1

Please read Correct Bash and shell script variable capitalization and https://mywiki.wooledge.org/Quotes to understand some of the issues in your script and copy/paste any shell script you write into https://www.shellcheck.net/ until you get the fundamentals down.

Regarding But this produces empty files - sure, for any give command cmd with

for f in *; do
    cmd "$f" > "out$f"
done

you're creating an output file for each input file in the shell loop so if any input file doesn't match $4==12 in your awk script (the cmd in this case) you'll still get an output file, it'll just be empty. If you don't want that you could do:

tmp=$(mktemp)
for f in *; do
    cmd "$f" > "$tmp" &&
    mv -- "$tmp" "out$f"
done

and write cmd to exit with a succ/fail status like grep does when it finds a match (trivial in awk), or you could check the size of "$tmp" before the mv:

tmp=$(mktemp)
for f in *; do
    cmd "$f" > "$tmp" &&
    [[ -s "$tmp" ]] &&
    mv -- "$tmp" "out$f"
done

You don't need a shell loop or other commands for this, though, just 1 call to awk to process all of your files at once. Using any awk in any shell on every Unix box do only this

awk -v file='random' -F'\t' '
    FNR == 1 {
        close(out)
        f = FILENAME
        sub(".*/","",f)
        out = "/home/user/extra/" file "/" f
    }
    $4 == 12 {
        print > out
    }
' /home/user/data/*.txt

If you want a string instead of numeric comparison so that 12. doesn't match 12 then do $4 == "12" instead of $4 == 12.

In the above file is a poor choice of variable name to hold the name of a directory but I left it alone to avoid changing anything I didn't have to.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185