0

In several files, I would like to extract the lines (with their number)

  • which contain the ClNonZ pattern
  • and which have the value "real" as first attribute.

for a unitary file, I get the line feed respect.

but I have several files, so I make a "for" loop, and then the multiple occurrences of a file are presented without linefeed

Exemple :

$ cat foo1.txt
A TEST 0.959660297 0 0.021231423 -0.0073 -0.0031 MhZisp
B REAL 0.180467091 0.800424628 0 0.0566 0.0103 ClNonZ
C REAL 0.98089172 0 0 -0.0158 0.0124 MhNonZ
D TEST 0.704883227 0.265392781 0.010615711 -0.0087 -0.0092 MhZisp
E REAL 0.010615711 0.959660297 0.010615711 0.0476 0.0061 ClNonZ
F TEST 0.704883227 0.265392781 0.010458211 0.0865 0.0548 ClNonZ

$ cat foo2.txt
A TEST 0.715498938 0 0.265392781 -0.0013 -0.0309 Unkn
B REAL 0.927813163 0 0.053078556 -0.0051 -0.0636 MhZisp
C TEST 0.55626327 0.222929936 0.201698514 0.0053 -0.0438 MhZisp
D REAL 0.492569002 0.350318471 0.138004246 0.0485 0.0088 ClNonZ
E REAL 0.704883227 0.265392781 0.010615711 0.0476 0.0061 AbbbbZ
F REAL 0.180467091 0.800424628 0 0.0566    0.0103  ClNonZ

grep without loop : result ok for me, with line break :

$  grep -n ClNonZ foo1.txt  | awk '$2 == "REAL" {print $0}'

2:B REAL 0.180467091 0.800424628 0 0.0566 0.0103 ClNonZ
5:E REAL 0.010615711 0.959660297 0.010615711 0.0476 0.0061 ClNonZ

grep in a for loop : bad presentation, line breaks have disappeared :

$  for file in `ls foo*` ; do line=`grep -n ClNonZ $file | awk '$2 == "REAL" {print $0}' `; if [[ -n "$line" ]]; then  echo $file ; echo $line ; echo " " ; fi ; done

foo1.txt
2:B REAL 0.180467091 0.800424628 0 0.0566 0.0103 ClNonZ 5:E REAL 0.010615711 0.959660297 0.010615711 0.0476 0.0061 ClNonZ
 
foo2.txt
4:D REAL 0.492569002 0.350318471 0.138004246 0.0485 0.0088 ClNonZ 6:F REAL 0.180467091 0.800424628 0 0.0566 0.0103 ClNonZ

I tried to used "while" instead of "for" (as explained in http://mywiki.wooledge.org/BashFAQ/001 as suggested by @chepner) without success.

would you have an idea that could help me, please ?

anton
  • 3
  • 2
  • 2
    Btw: See: [Iterating over ls output is fragile. Use globs.](https://www.shellcheck.net/wiki/SC2045) and [Bash Pitfall #1](http://mywiki.wooledge.org/BashPitfalls#pf1) – Cyrus Oct 23 '22 at 21:00
  • Question is misleading. `grep` and `for` loop are not the cause of the problem. The last step, you added a subshell to capture the output to a variable and then you use `echo` on that variable. That is the cause of the line breaks to disappear. – Stephen Quan Oct 23 '22 at 21:03
  • As the `bash` tag you used instructs - "For shell scripts with syntax or other errors, please check them at https://shellcheck.net before posting here." – Ed Morton Oct 24 '22 at 01:50

1 Answers1

0

The primary problem here is that you didn't double-quote your variable references, especially in echo $line (should be echo "$line"). This often causes problems like this. See "I just assigned a variable, but echo $variable shows something else" and "When should I double-quote a parameter expansion?" (short answer: almost always).

Shellcheck.net is good at pointing out common mistakes like this, and will also have some other good recommendations for your code. I recommended using it!

However, in this case, I'd be tempted to replace the entire bash+grep+awk thing, since awk can do it all itself:

awk 'FNR==1 {needheader=1}; ($0 ~ /ClNonZ/ && $2 == "REAL") {if (needheader) {print ""; print FILENAME; needheader=0}; print}' foo*.txt

Explanation:

  1. FNR==1 {needheader=1} -- this triggers at the beginning of each file (FNR is the line number within the current file, so if it's 1 this is the beginning of a file) and sets a variable saying that if there's a match, the filename needs to be printed.
  2. ($0 ~ /ClNonZ/ && $2 == "REAL") -- if "ClNonZ" appears in the line, and the second field is "REAL", then do the following stuff in { }. Note: do you actually want to search the entire line for "ClNonZ", or just the last field? If it's just the last field, use $NF == "ClNonZ")
  3. if (needheader) {print ""; print FILENAME; needheader=0} -- if this is the first match within this file, print a blank line and the filename, then clear the variable that says this stuff needs to be printed.
  4. print -- ...and print the line. Note that $0 is implicit here, and since this is still in the { } from step 2, it only happens if the line matched.
  5. foo*.txt -- just pass all the matching filenames to awk as arguments, and let it scan over all of them in a big batch.
Gordon Davisson
  • 118,432
  • 16
  • 123
  • 151