0

I have a simple loop that reads an input file (.xyz) line-by-line and echoes it into an output file (.gjf). However I want it to terminate once it encounters a line only containing "Bonds" so that only the initial part of the input file is copied into the output file.

I am doing this for many input/output files (hence ${i} as the filenames) so the number of lines copied over may not be the same, but will all contain a line with "Bonds" at some point. This is why I need to read line-by-line.

The code below will echo into the output file correctly, but won't exit the while loop when it encounters "Bonds", instead copying over the entire input file. Is the output of read -r not treated as a string?

 while read -r LINE ; do
        if [[ "$LINE" == "Bonds" ]] ; then
            break
        else
            echo "$LINE" >> "${i}.gjf"
        fi
done < "./${i}.xyz"

Below is an example of the input (.xyz) file:

 0 2
 6        0.097055000     -1.260034000      1.473340000
 6        0.336623000     -0.000274000      0.631466000
 1        0.279787000     -2.153356000      0.870406000
 1       -0.939049000     -1.279769000      1.829917000
 1        0.757490000     -1.287905000      2.345029000
 Bonds
 1     2    S
 1     3    S
 1     4    S
 1     5    S
 2     6    S

I would only like the lines above "Bonds" to be copied. I am new to bash, so some guidance would be appreciated. Thanks.

Will
  • 1
  • 1
    This works for me on your example data; I suspect there's something like nonprinting characters (or maybe DOS/Windows line endings) in your .xyz file, messing things up. Try viewing the .xyz file with `LC_ALL=C cat -vet file.xyz` -- there should be a "$" at the end of each line, but if you see any other anomalies that's a possible source of trouble. BTW, it'd also be much faster to do this with `awk '$1=="Bonds" {exit}; {print}' "${i}.xyz" >"${i}.gjf"` – Gordon Davisson Jun 27 '21 at 02:41
  • To go around the situations @GordonDavisson mentionned, you could do `if [[ "$line" =~ "Bonds" ]] ...`. This will match the word "Bonds" anywhere in the line. Or you could limit it to "starts with Bonds" with `if [[ "$line" =! "^Bonds" ]] ...`. – Nic3500 Jun 27 '21 at 02:46
  • @GordonDavisson I am doing this through git bash via VScode on windows, so that is likely the issue. When I run LC_ALL=C cat -vet file.xyz the lines end with "^M$" which I assume is end line on windows. I have run into this issue before, so I figure I can just run the input files through dos2unix. I will take a look at your suggested method, thanks for the advice. – Will Jun 27 '21 at 02:52
  • @Will This should work with Windows line endings: `awk '/^Bonds\r?/ {exit}; {print}' "${i}.xyz" >"${i}.gjf"` (unless there's a space at the beginning of lines as in your example, in which case something like `awk '/^ *Bonds\r?/ {exit}; {print}' "${i}.xyz" >"${i}.gjf"` will work). – Gordon Davisson Jun 27 '21 at 03:34
  • @Will: dos2unix is likely your friend. also check if git bash has a utility called `sed` (stream editor) which is handy, too. similar to `awk`, which might also be part of git bash, but I don't know. – hakre Jun 27 '21 at 07:42

0 Answers0